Author(s): Wong Kin Yeung (ESAP), K. H. Yeung
 
Summary:
The existing designs and solutions for web caching systems commonly make caching decisions based on document or uniform resource locator (URL) information. While this conventional approach works, the authors deliver insights into the alternative approach using site information for web caching design. This site-based approach makes caching decisions based on the website that an object belongs to, rather than the object itself. The authors show that this new approach can benefit different scopes of cache design, ranging from internal operation of a single proxy (host level), mapping of proxy array in a local area network (LAN level), to load reduction in the global cache hierarchy (wireless area network (WAN) level). Since disks are usually the performance bottleneck in a proxy, to overcome this, the authors design a site-based cache architecture that tries to store web objects belonging to the same site in nearby disk blocks. The simulation results show that it reduces disk access time by 21-50%, compared to the conventional URL-based cache architecture. Besides, in the LAN-level design, a site-based mapping scheme can be used to map all requests targeting the same website to the same proxy, resulting in up to 50% reduction in the total transmission control protocol connection overhead. On the other hand, in the WAN-level, upper-level proxies are usually overloaded. To solve it, site information is used in lower-level proxies to decide which requests to forward to the origin servers directly (instead of upper-level proxy). As a result, proxy load is reduced by 46-59% and a reduction of 9-25% in request compliance delay can be achieved. With the merits of these three levels, it is believed that the site-based approach would contribute to the future caching designs.