There are two metrics often considered when analyzing the risk of a given situation and configuration:
1) MTBF: Mean Time Between Failures: This is the time between failures of a system or device. It is preferred for this metric to be as large as possible. Ideally things would never fail but in reality that just isn't reasonable, so when they do fail, you don't want it to happen often.
2) MTTR: Mean Time To Repair: This is how long it takes to recover from a failure once it happens. It is preferred for this metric to be as low as possible - meaning "fixes" are fast and the system or device can be brought back online quickly.
Since some systems or devices, for reasons beyond the scope of this posting, don't have the best of both values, understanding these metrics can help compare different scenarios. For example, what is better: high MTBF and high MTTR (doesn't fail often but it hard to repair when it does) or low MTBF and low MTTR (fails frequently but is quick and easy to repair)? Another factor mixed in there is the cost of the solution - some projects are more cost sensitive than others. Yet another factor is the criticality of the project - some can incur downtime with little more than a minor inconvenience to the user or business.
These are good metrics to understand and consider but in a well designed and properly architected solution redundancy can be built into the project to minimize the impact of failures. When precautions are taken and the solutions are designed well, system faults become just process and administrative issues rather than customer or business impacting situations.
Working with customers to properly architect highly-available and highly-scalable solutions such as webfarms, mirrored systems, and clusters is a large part of the services we provide at ORCS Web. It is one of the reasons we are known for being #1 in both customer service and technical support.
Happy hosting!
~Brad