[12] WHEN ARE THE SYSTEM MAINTENANCE WINDOWS? WHY THE LOW UPTIME? Typically the SDF Public Access UNIX System is available to its members and, in some cases, the general public 24 hours a day, 7 days a week, 365 days a year, 10 years a decade, 25 years a quarter century .. and so on. That being said there are unforeseen issues that can cause the system to become unavailable: 1. Hard Disk Crash - We have several spare drives, some of them already plugged in and ready to be used. In the best case scenario no maintenance window is required. 2. Fire - In the case of fire all SDF machines must be shut down unless the fire is an isolated occurance. 3. Natural Disaster - In the Spring (Apr-May) we do get affected by lighting strikes in our area due to heavy thunderstorms. Best case scenario the UPS systems filter the spikes and dips which allow SDF to run uninterrupted. 4. Software Bug - This due crop up from time to time and are usually related to system updates. On SDF we typically will let the public access machines lag behind NetBSD development in order to test new releases in our lab before subjecting the userbase to 'new bugs'. 5. Routine and Scheduled Maintenance - Please read below. 6. Hardware Component Failure - We have many spare machines, some completely cabled up and ready to go at the flick of a remote command. If an SDF client host becomes completely unrecoverable, a spare can be put into operation within minutes. Keep in mind that while all of your personal files are hosted on the file server, the /tmp directory is exclusive to each SDF client host. ROUTINE AND SCHEDULED MAINTENANCE There is a weekly maintenance window on Sunday mornings beginning at 02:00 AM until 03:00 AM. This windows is not always used and when it is, it is used very briefly. 5 minutes prior to a shutdown or runlevel transition all logged in members will be notified on their terminals. If you see this message alerting you to system maintenance, you should save all open files and prepare to logout. Scheduled maintenance is always announced several days in advance on the bboard in the board. If it that maintenance window requires extended time (basically anything over 5 to 10 minutes) the /etc/motd file (displayed at login) will note the details of the event. Scheduled maintenance is really only used when hardware upgrades have to take place. In most cases, software updates can occur while the systems are up and available. WHY THE LOW UPTIME? Uptime is relative. What we're after is 'high availability'. This means that our goal is to have the servers answering at least 99.9% of the time. In the 20+ years of service SDF has been able to meet this goal. The most uptime you'll see on any given server will be about 3 to 4 weeks. After 3 weeks performing maintenance is necessary. This helps with clearing buffers, caches and other inconsistencies that can occur as the systems run from cold or warm boot. Rather than waiting for the system to fail due to kernel panic or a hang, a warm boot is performed, during the weekly maintenance window, which takes roughly 5 minutes or less. Keep in mind, this doesn't occur weekly but usually after 3 to 4 weeks of linear uptime. Why is this necessary? (aka "My box runs for years under my desk"). We too have very low usage non-public NetBSD systems that run for years without requiring a reboot. However, SDF is extremely high volume with sophsiticated NFS, NIS and VNODE caching. While these do not cause problems with light loads, with 40,000 active users they become an issue. Again, our goal is high availability which doesn't necessarily have to translate it long uptimes. [[misc|back]]