You can't change the laws of nature.

The laws of nature are usually considered to be the fundamental, unchanging and non bendable laws of the universe. Except in science fiction movies of course. Although it is not always obvious, these laws do have an impact on system design. Many times the effects don't become obvious until you push the hardware towards its maximum performance. They need to be taken into consideration during the design phase because you can't get around them later regardless of how hard you try.

Rule #1: The speed of light (or electrons) cannot be increased.

There is often a lot of discussion about the network bandwidth that a site will require. This is usually arrived at by looking at the number of pages to be served and the size of an average page. While this is an important number, an equally important number is the physical distance to a critical resource. With today's distributed web sites, that critical resource may be several thousand miles away. In many cases the request for information will actually be small, so the total bandwidth is not the limiting factor. The response time (how long it takes to get the information) is what is important. And response time depends on several factors, not the least of which is distance. Given two networks, one with one mile between hosts and the other with one thousand miles between hosts, the shorter one will have a better response time. You will not be able to overcome the distance with technology. The request and reply can get there and back no faster than the speed of an electron. And you can't change that.

Rule #2: There is no free ride; every device introduces delay.

If the first rule has been ignored, I'm often asked if there is some way to speed the link up. The answer is no. After a certain distance is traveled, you have to add a booster or repeater. That device will add a delay as will every other device through which the signal must pass. This is in addition to any application induced delay.

Rule #3: The large don't eat the small, the fast eat the slow

We often like to think that the large, powerful animals prey on the smaller ones. In actuality, the true indication of a predator is speed and flexibility rather than size. Killer whales eat blue whales; lions eat giraffes; wolves eat deer and moose. It should also be noted that all of these predators hunt in packs. This implies that groups or clusters of smaller servers may provide a more flexible and powerful solution than fewer large servers.

Rule #4: Complexity takes more food than simplicity

This rule balances rule #3. As organisms grow and become more complex, they need more food (ie, system administration time). If you don't want to be eaten by the complexity, then you have to plan it for simplicity. This means that it is documented, that there are clear lines of communication, that the system 'knows' how to call for help, that each piece is as similar to the other pieces as possible. If the system is built out of 100 pieces that are each 95% identical, then the complexity is reduced.

Rule #5 Well designed systems approach self maintenance.

Well designed systems should clean up after themselves (log rotating, file system cleanup)
They should let someone know if something is wrong (alerts, email, etc) They should yell when attacked
They should be able to tell the doctor what has gone on in their lives for the last few days so he can figure out what is wrong (history logs)

Rule #6: In the end, the hardware matters

What a system is made of will eventually determine its longterm survival and performance characteristics. These may be hidden or masked over, but under stress, will be revealed. An interesting example of this is disk throughput as it may be impacted by the sizing of data elements. No matter how much cache RAM, RAID software or logic you have between your application and the disk platter, eventually, the data is written to a platter (or platters). One of the characteristics of any disk drive is how much data it can transfer to or from a platter in one revolution of the disk. If your basic data element is smaller than that quantity, then your performance under stress will be better. If it grows larger than that quantity, then performance will suffer. This characteristic won't be visible until you max out your cache, but it is there none the less. And we know when that performance hit will occur....2AM on a holiday.

Conclusion

This paper has grown out of my observation that many of the problems I'm asked to solve are systemic. This means that the correct fix is to redesign the system. In reality, most of the time it means patching the existing infrastructure because the resources are not available for redesign and reimplimentation. Just as it is hard to change a first impression once made, it is hard to redesign a system once implemented.


Author

Jim Wildman has been mucking around Unix since 1985. After a few years on Suns, he changed jobs and started working on HP's. Then he added Linux in 1994 or 95, a dabbling of AIX in '98, and most recently some SGI IRIX and Solaris. Those jobs included stints in electronics manufacturing, healthcare, and now the online sales industry. He is currently employed as a lead consultant by divine/Whittman-Hart in Dallas, TX. These pages were produced using some combination of RedHat Linux, Quanta, txt2html, and of course vim.
Comments, suggestions and kudos to jim@rossberry.com. Flames to /dev/null. All trademarks are the property of their respective holders. Last update May 9, 2001.