Last month (November 2001) I concluded that in ASP.NET, caching is the key to performance if you want to exploit Web controls and maintain optimal server response times. Caching relates directly to applications that can work disconnected from the data source. Not all applications can afford this. Applications that work in a highly concurrent environment that need to detect incoming changes to data can't be adapted to work disconnected. However, there are scenarios where you have a large block of user-specific data that needs to be analyzed, sorted, aggregated, scrolled, and filtered. In this case, your users need to extrapolate numbers and trends, but aren't interested in the last-minute record. In this case, server-side caching can be a key advantage.
Data caching can mean two things. You can temporarily park your frequently used data into memory data containers, or you can persist them to disk on the Web server or a machine downstream. But what is the ideal format for this data? And what is the most efficient way to load it back into an in-memory binary usable format? These are the questions I will answer this month.
ADO.NET and XML
ADO.NET and XML are the core technologies that help you design an effective caching subsystem. ADO.NET provides a namespace of data-oriented classes through which you can build a rough but functional in-memory DBMS. XML is the input and output language of this subsystem, but it's much more than the language used to serialize and deserialize living instances of ADO.NET objects. If you have XML documents formatted like data—hierarchical documents with equally sized subtrees—you can synchronize them to ADO.NET objects and use both XML-related technologies and relational approaches to walk through the collection of data rows. Although ADO.NET and XML are tightly integrated, only one ADO.NET object has the ability to publicly manipulate XML for reading and writing. This object is called the DataSet.
ASP.NET apps often end up handling DataSet objects. DataSet objects are returned by data adapter classes, which are one of the two ADO.NET command classes that get in touch with remote data sources. DataSets can also be created from local data—any valid stream object can be read into and populate a DataSet object.
The DataSet has a powerful, feature-rich programming interface and works as an in-memory cache of disconnected data. It is structured as a collection of tables and relationships. This makes it suitable when you have to work with related tables of data. Using DataSets, all of your tables are stored in a single container. This container knows how to serialize its content to XML and how to restore it to its original state. What more could you ask for from a data container?
Devising an XML-based Caching System
The majority of ASP.NET applications could take advantage of the Cache object for all of their caching needs. The Cache object is new to ASP.NET and provides unique and powerful features. It is a global, thread-safe object that does not store information on a per-session basis. In addition, the Cache is designed to ensure it does not tax the server's memory whatsoever. If memory pressure does become an issue, the Cache will automatically purge less recently used items based on a priority defined by the developer.
Like Application, though, the Cache object does not share its state across the machines of a Web farm. I'll have more to say about the Cache object later. Aside from Web farms, there are a few tough scenarios you might want to consider as alternatives to Cache. Even when you have large DataSets to store on a per-session basis, storing and reloading them from memory will be faster than any other approach. However, with many users connected at the same time, each storing large blocks of data, you might want to consider helping the Cache object to do its job better. An app-specific layered caching system built around the Cache obje