Caching
-------
Cache to an application is like a palette to a painter. A set of data is temporarily put on the cache as the application runs, the same way a painter picks a few color to the palette as he works on the painting. Data might be combined and modified in the cache, the same the color is mixed on the palette. The cache keep some frequently used data in memory, and save the application from accessing disk drive or database all the time, which save time. The palette keeps the mostly frequently used color, and reduces painter trips to the color tub. Of course, no analog goes all the way. In this case, data does change and need to store back. But, painter doesn’t put the color back into a tub when he discovers a new color that he likes.
Of course, high-volume computing didn’t stop at real-time updates. Let’s continue!
Stateful Session----------------
Applications like Y! Mail can use stateless session approach for to achieve scaling. However, in some application, it is highly desirable, in programming perceptive, to have stateful session. Information of the current logged in user is good candidate to be store in a session. Some site might enable GUI hint to allow editing of some attribute. In other case, stateful session also enable easy design of web application that involves multiple steps. A questionnaire might involve multiple pages. With stateful session, previous page’s answer can be retained in the session. Some new high interactive site that provides full desktop-like experiences use stateful session heavily to store current active view, opened document, table sorting order etc. Session does not tend to be survives forever. It is often discarded after timeout or user log off. Scaling such application requires different strategies. (side note: many developers site use shopping cart as an example for stateful session. However, it is probably not the right approach. User might log out due to various reasons. But, it is often desirable to have the cart still when user logged back in.) Application server or web server framework is often used for session management. Scalability is often built into the application server of framework itself. For web application, HTTP load-balancer (dedicated hardware) are often use to aid the task. A HTTP load-balancer understands HTTP session ID, or rewritten URL and always reroute the same client to the same server, (unless the server failed) such that the session can be lived in the physical server. Failover of session has been a major selling point of an application server.It is important to note that the functionality of stateful session can be simulated by moving all the trasient state to browser (probably hidden fields in HTML form), always store them back to the data layer, or both. In a way, stateful session can be viewed as "division" strategy. By factoring out data that is transient and short-lived, and keep them in memory, it reduces the hit to the data layer and increases the performance. The division also enable the use of HTTP load-balancer.Tag: clustering, database, grid, virtualization
Database Driven
---------------
Database provides query, indexing, transaction, storage management, data security and other support. Database driven application is very common. Database tier is being inserted in between the application and storage. Query optimization and indexing enhanced performance and help the application to achieve higher volume. However, its power and convenience is usually turned to extra feature to the application and rarely leave user impression of performance. Share-nothing strategy can be used. Bandwidth from the application tiers and database tiers are very critical to the overall performance. Before gigabyte ethernet becomes common, high-volume application utilize multiple 100Mbps ethernet card per machine to achieve the desired bandwidth.
-----------
In the J2EE world, EJB was famous on its EntityBean concept. EntityBean is Component model representation of a single entity (relational db) in a Database. (loosely speaking, component is an object which lifecycle managed by a Container.) With this model, an application into two tiers: the presentation tier and the business logic tier. The EJB model represents the later. Direct access (thru RMI or RPC) by client pretty much went out of fashion. The presentation tier most likely is a web server that serve HTML or expose the Business logic functionalty as Web Service. The scaling and load-balancing of Entities are enabled by the remote Stub and Skeleton model. Clustering is provided by that application server with EJB container. The Stub on the client side, when instantiate, was assigned to one of the server.The database access (persistent) of the entity is sometimes provided by the container. However, no serious application developer didn't complains about the performance of this container managed persistent (or CMP). The first EJB design was simply too heavy-weighted. EJB 3.0 moves to a much ligher-weight model. It appears to be good enough for long-term sucess. However, a very wide selection of programming model and framework exists.Tag: clustering, database, grid, virtualization
I still remember that an instructor of my college favored token ring and thought it was a better technology. He envisioned that if token ring had better marketing at the time, it would have dominated the market. The cost differences, he argued will disappear because of the huge economy of scale once a technology like that had become mainstream. But, it was too late. Ethernet had already dominated the market. I thought he was right.
It was eight years ago. After all, both of protocols vanished. People still refer to Ethernet all the time. However, is Ethernet really Ethernet anymore?Since the market moved to 100Mbps, the concept of Hub phrased out. Because a hub does not wait until the whole frame (layer-two packet) to be transmitted before forwarding the frame, a hub has advantage on lower latency. However, it becomes less significant as the speed increased 10 folds. When the frame size remains the same, the latency is cut by 90%.The beauty or uniqueness of Ethernet is actually happening in the layer 2. It sends frame after waiting a random period, send the frame, do collision detect, and resend if necessary. Now, Ethernet is connected port to port from a computer to the switch. All Ethernet are full duplex, Cat-5 twised pairs cable use different wire for transmission and reception. It essentially guarantees that no collision is possible. Say bye to aloha protocol that powered corporate network for more than 10 years. Using switch actually bring significant performance advantage. In a busy network, the theoretical network thru put is about 57% (if I remembered it correctly). With full duplex and use of switch, we are getting pretty close to 200Mbps.Most of the switch (at least entry level) use 5-port switch chip. The interconnection is likely to be Star Network. The Mac layer packet is forwarded exactly once to reach the destination.8-Port Switch can be constructed by combining two chips. One port of each chip is connected on the circuit board level. Packet is forward 1 to 2 times. 16-Port can be constructed using 5 of such chips, with one port of each of the 4 chip connecting to a port on the 5 chip. Packet is forward 1 or 3 times.Tree network won the final battle.Cluster
-------
When the volume cannot be satisfied by wait and division, and it is more economical to spend engineers’ time comparing with buying massive computer, multiple machines are used to form a cluster, software and configuration can be modified to accommodate.
------------
Scaling static data to high-volume is easiest. Putting up the multiple machines, replicate the data, and make them serve client randomly.Non real-time updates
---------------------
Data in most useful applications changes, however, they don’t necessarily change frequently. Even if the data changes all the time, a user may not be given the most updated version of the data. Those applications can be divided into two set of computers: low-hits updates, and high-hits queries. Updates are done as frequently as needed to the updates set. All queries hit routes to the second set which has its own set of data that is a snapshot of the past. It scales the same way as for static data. Once in a while, changes are aggregated from the first set and updated to the second set as a batch operation. The batch operation can be done periodically, or done during lower-traffic period. Majority of high-volume applications fall into this categories. For example, USPS tracking are used by thousands of user everyday. The updates are brought from the low-hit set to high-hit set every night. During the day, no matter how many times you refresh the page, it was the data of yesterday night to show in your browser. UPS update the data much more frequently. But, it is still not real time. The “delivered” status showed on the site only an hour after I had sign for my package. Many e-tailers show their stock status as “in stock”, or “low”, or “out-of-stock”, but the status is not updated instantly, but only a few times a day, even though real-time data will be highly desired.Likewise, search engine are divided into update set and query set. In this case, both set require sophisticated clustering strategies itself. Reader can refer to The Anatomy of a Search Engine for details.But there is another reason making this approach is so popular. The small-hit set (might well be one computer) already exists before the business goes online. For example, a computer stores might has its stock management system. Adding a new set of computer to serve web customer queries about stock makes a lot of sense and bring minimal modification and burden to an existing system.Real-time updates
-----------------
When both real-time data and high-volume are needed, customized software are needed. For example, Web mail fall into this category. Scaling of mail is a common-enough problems. Finished customized software for mass mails is relatively easy to find. However, early companies like Yahoo highly customize their mail servers to achieve the volume. Yahoo employed a number of strategies to scale, including Partition, Division, memory only data. First, yahoo partitioned its servers with geography locations. User id is unique globally, except for a few countries. Japan is such as exception country. For most sites, like Y! Mail, logging thru other countries page will route you back to the server of your country. Even the Y! id is unique, the email address are not interchangeable. For example, sending email to helloworld@yahoo.ca will bounce if the account is opened with yahoo.com.The second strategy is to use in-memory database. Y! has 300 millions user worldwide, the last time I heard of it. Even with it huge number, it is still possible to store the entire id list in the memory of a single server. Only the most primitive data are stored along with the id, such as the country code. Of course, the data itself is persisted back to storage and is fully backed up. The list changes relatively slowly. Multiple servers are used along with the server that other server will do the query. The data replicate among them. It takes a few minutes for a newly created account to propagate globally. In this case, it is acceptable.To make a useful app, the in-memory database doesn’t provide enough information. Some data also might change more rapidly. For example, when did user logged in last time, when the session is expired, has user logged out. Another layer of division is used: login server. The number of user who currently logged in is way smaller than the number of total user, even though yahoo allows user to log in a few times simultaneously with multiple browsers or computers. A cluster of computers are used just to keep track of logged in session. Web apps like Y! Mail check against a logged session with the page cookies to determines if the operation is valid. The logged server is high-volume itself.The application itself is handled by another cluster of server. Those servers are designed to be stateless. So, all operation can be randomly route to any of the server and serve user request. It posed some limitation and challenge to the developers of the application, but it works.Scaling: The Share-Nothing Architecture
---------------------------------------
The Y! Mail architecture capture the core design of "Share-Nothing Architecture". The LAMP (Linux Apache MySQL PHP/Python) stack which is increasingly popluar use this as the scaling model. The same can be said for Ruby on Rails.Another layer that requires clustering itself is the storage. Because the application itself is stateless, all data is coming from the storage layers directly. The storage receive extremely high hit. Cache and RAID techniques are certainly used. Y! uses NetApp solution for scaling of its storage layer.Tag: clustering, database, grid, virtualization
A long way coming, I started my programming career as a Technical lead of an Object-Relational mapping software, Castor JDO. Concurrent programming, transaction, real-time data, data integrity, caching, distributed, clustering, scalability, load-balancing, high availability, and high-volume have always interested me since my first job. All of these are old terminologies that exist since the early days of computing. I called them the essences of Enterprise Computing. But, after years, such software is still a huge challenge and great endeavor.
High-volume-----------
High-volume applications aren’t new at all, but the web makes many applications becomes high-volume. It the early days, high-volume is often handled by massive computer and highly customized software. Just like water always finds its way downward, the market finds a cheaper way to do what needs to be done. As water gather in a meadow and the water level rises, it finds (or breakout) new and faster ways to go down. As high-volume applications become common, new and cheaper way are designed to handle them. One machine
-----------
The first method to scale is to fully utilize one machine, adding more memory, faster disk, and more CPU if the system supports it. Multi-cores, or multiple CPU system becomes common. Sun’s provides some really good resources in this area. You can being with this video Sun Nettalk (Scale)Wait
----
The second cheapest solution to scale to higher volume sometimes is to wait. Of course, it is often unacceptable in most case. But, computer is doubling its speed every 1.5 years. The market constantly introduces new technologies announced that helps increase system preformance. Who said “waiting never solve the problem?” Same apps will be running twice as fast, if you spend the same hardware cost again 18 months later. If your volume is doubling less frequently than every 1.5 years, your problem is solving itself eventually.Cost of making software or is very expensive. The cost of spending a man month of engineer time cost the same as a decent mid-range server. One man month doesn’t really get that much done. It is especially true if the software develop cannot be resued. Good engineers are always overworked, and should be considered as limited resource. However, salary is often considered as a fixed cost to business. Sometimes management favors spending engineer time over buying hardware. Of course, acquired hardware isn’t maintenance free and incurs cost.Division
--------
Division is arguable the most important aspect of high-volume computing. As you will see from the following paragraphs and later blog, high-performance often comes from the right division. The first steps are to divide different service into different machine. For example, move HTTP service out of the machine that serving mail. The second is to partition the service across nature boundary. For example, mail belongs to a heavy site to be single out onto another machine. Going deeper, you might move HTTP server that serve images server from server that serve HTML pages.Application might also be divided according to its scaling behavior. Application that serves static information is easiest to scale. The static part can be factoring out into a different machine or a set of machine, and leave as much as CPU to the part that change frequently. Move storage into a separate machine, or dedicated hardware. Adopt multi-tiers architecture and splits each tiers into machines with high-speed network connection.Tag: clustering, database, grid, virtualization
My first computer comes with MS DOS 3.1. Books were very expensive for a 5th grade kid, there was no internet, and “/?” parameter wasn’t incorporated in the dos command. Only knows “cd”, “c:”, “echo” commands, the menu are for display only. I made a batch of single-letter name batch files to achieve menu like behaviour. Not for long, I encounter of software maintainability shortly after my first menu system was done: I got a new game from a friend. The new game became one of my favorite, and I would like to put it ahead of other games. I rename each batch files and edit each of the menu’s “echo” line. Not for long, I gave up the idea of having a menu. It was my first encounter of software maintainability. Even for simple menu like this, the cost of ownership was beyond the time spent of creating the system in the first place.
Ok, enough stories of the good old day. :-)Tag: maintainability
I still remember the day that my mother bought us our first PC to share with my brother and sister. I was grade 5 if I remember correctly.
First post! This blog will be a placeholder for my thoughts of computer/software engineering/development and other technologies issues.
The title of this blog was inspired by a quote of Burt Rutan, the architect of SpaceShipOne: “Complexity we remove can never fail.” Burt is definitely not the first one who stress on simplicity. Einstein’s left this famous quote half a century ago: “Things should be made as simple as possible, not simpler.”Another famous quote from an anonymous is “KISS – Keep It Simple, Stupid!” While I totally believe simplicity is the way to go, I do think this quote is as accurate as the quote from Brut or Albert, at least in the context of engineering. In my experience, simplicity is not likely to occur naturally. Usually, the first draft of my software is more complicated than necessary. Keeping it is not enough. We need to take additional steps to pursue it.The first version is often like the path we draw when the first time we try to solve a large maze. The first is often not the shortest. It takes additional steps to remove the unnecessary steps. Simplicity is not mere of keeping. It is a matter of “remove”; it is a matter of “make”.