Caching
-------
Cache to an application is like a palette to a painter. A set of data is temporarily put on the cache as the application runs, the same way a painter picks a few color to the palette as he works on the painting. Data might be combined and modified in the cache, the same the color is mixed on the palette. The cache keep some frequently used data in memory, and save the application from accessing disk drive or database all the time, which save time. The palette keeps the mostly frequently used color, and reduces painter trips to the color tub. Of course, no analog goes all the way. In this case, data does change and need to store back. But, painter doesn’t put the color back into a tub when he discovers a new color that he likes.
Memory is multiple magnitudes faster than drives and database. By saving access to drives, and keep some data in the memory, the application runs a few times faster. Of course, if requests are extremely random that doesn’t tend to repeat, and/or the data set is extremely large compare with the memory size, cache might not be well utilized and incurs unnecessary overhead. But, it should be looked at exception.
If multiple machines are used, and data need to be stored back, keeping the data in the cache of each machine can become a challenge. When we have one machine, we always know if the data is updated or not. If we modify it, then it is the new data, and we need to store it back. If we didn’t modify it, then it is up-to-date. With multiple, we didn’t modify it, some other machine might. Synchronization mechanism is needed and it must handle machines that try to modify the same data at the same time. Distributed cache is designed for just that. In the Java world, multiple J-Cache implementations are available.
Tangosol Coherence appears to be the leader of the space and claims deployed customers in multiple industries.
Turning the cache off can be a painful answer. It means now we need a few times more machines just to achieve the performance we had with one. One strategy is using cache in the data store level, which helps. It is like having a palette, but instead of having carrying it, it is fixed on the table. It is often what the “Share-Nothing Architecture” does.
Distributed cache is relatively new and requires additional integration. I envision distributed cache will be integrated part of application server in the future and will be part of J2EE and .Net offering. I also saw a LAMP stack company
ActiveGrid job post for engineer to implement distributed cache.
In my opinion, the use of distributed cache is preferred over share nothing architecture and it will be the model of future. I am actually developing one myself as my hobby. We will see how the industry unfolds on this.
Tag: clustering, cluster cache, distributed cache, database, grid, virtualization