no way to compare when less than two revisions

Differences

This shows you the differences between two versions of the page.


faq:misc12 [2011/04/30 17:33] (current) – created clemens
Line 1: Line 1:
 +<code>
 +[12] WHEN ARE THE SYSTEM MAINTENANCE WINDOWS? WHY THE LOW UPTIME?
  
 +     Typically the SDF Public Access UNIX System is available to its
 +     members and, in some cases, the general public 24 hours a day,
 +     7 days a week, 365 days a year, 10 years a decade, 25 years a
 +     quarter century .. and so on.
 +
 +     That being said there are unforeseen issues that can cause the
 +     system to become unavailable:
 +
 +        1.  Hard Disk Crash - We have several spare drives, some of
 +            them already plugged in and ready to be used.  In the
 +            best case scenario no maintenance window is required.
 +
 +        2.  Fire - In the case of fire all SDF machines must be shut
 +            down unless the fire is an isolated occurance.
 +
 +        3.  Natural Disaster - In the Spring (Apr-May) we do get 
 +            affected by lighting strikes in our area due to heavy
 +            thunderstorms.  Best case scenario the UPS systems filter
 +            the spikes and dips which allow SDF to run uninterrupted.
 +
 +        4.  Software Bug - This due crop up from time to time and are
 +            usually related to system updates.  On SDF we typically 
 +            will let the public access machines lag behind NetBSD
 +            development in order to test new releases in our lab before
 +            subjecting the userbase to 'new bugs'.
 +
 +        5.  Routine and Scheduled Maintenance - Please read below.
 +
 +        6.  Hardware Component Failure - We have many spare machines,
 +            some completely cabled up and ready to go at the flick of
 +            a remote command.  If an SDF client host becomes completely
 +            unrecoverable, a spare can be put into operation within 
 +            minutes.  Keep in mind that while all of your personal files
 +            are hosted on the file server, the /tmp directory is exclusive
 +            to each SDF client host.  
 +
 +     ROUTINE AND SCHEDULED MAINTENANCE
 +
 +     There is a weekly maintenance window on Sunday mornings beginning at
 +     02:00 AM until 03:00 AM.  This windows is not always used and when it
 +     is, it is used very briefly. 5 minutes prior to a shutdown or runlevel
 +     transition all logged in members will be notified on their terminals.
 +     If you see this message alerting you to system maintenance, you should
 +     save all open files and prepare to logout.
 +
 +     Scheduled maintenance is always announced several days in advance on
 +     the bboard in the  board.  If it that maintenance window 
 +     requires extended time (basically anything over 5 to 10 minutes) the
 +     /etc/motd file (displayed at login) will note the details of the event.
 +
 +     Scheduled maintenance is really only used when hardware upgrades have
 +     to take place.  In most cases, software updates can occur while the
 +     systems are up and available.
 +
 +WHY THE LOW UPTIME?
 +
 +     Uptime is relative.  What we're after is 'high availability' This
 +     means that our goal is to have the servers answering at least 99.9%
 +     of the time.  In the 20+ years of service SDF has been able to meet
 +     this goal.  The most uptime you'll see on any given server will be
 +     about 3 to 4 weeks.  After 3 weeks performing maintenance is necessary.
 +     This helps with clearing buffers, caches and other inconsistencies 
 +     that can occur as the systems run from cold or warm boot.  Rather
 +     than waiting for the system to fail due to kernel panic or a hang,
 +     a warm boot is performed, during the weekly maintenance window, which
 +     takes roughly 5 minutes or less.  Keep in mind, this doesn't occur
 +     weekly but usually after 3 to 4 weeks of linear uptime.
 +
 +     Why is this necessary? (aka "My box runs for years under my desk").
 +     We too have very low usage non-public NetBSD systems that run for years
 +     without requiring a reboot.  However, SDF is extremely high volume with 
 +     sophsiticated NFS, NIS and VNODE caching.  While these do not cause
 +     problems with light loads, with 40,000 active users they become an
 +     issue.  Again, our goal is high availability which doesn't necessarily
 +     have to translate it long uptimes.
 +</code>
 +
 +[[misc|back]]