Director's Report: DEBRIEF PART 2: Technical Details
and the Future

(Updated: Friday 5th March 5:00PM)



15,839 Registered
Member Comments


System Status
11th Mar @ 20:10
  • Incoming Mail
  • Web Access
  • Client Access
  • Hosting Service
 
Service Status
This page shows a realtime display of our server infrastructure, green dots indicating an online server whilst a red dot indicates a server that is currently unavailable.



Infrastructure Explained
Our service is configured to utilise multiple redundant servers for critical services. Each box in the diagram is a logical grouping of servers that are able to 'fail over' to each other. As long as one server within a box is available, the service as a whole is available.

Coloured Groupings Explained
To make it easier to visualise our infrastructure, we have colour coded various blocks to highlight their roles in handling your email.

  • Yellow
  • This area handles incoming email, usually reaching one of the 'AV/GreyListing' gateway servers first followed by one of the spam firewalls. Once scanned at both stages your new email is delivered to one of the mail servers.
    In the event the 'AV/GreyListing' servers are busy, email is delivered instead to any of the 'Backup MX' servers who deliver your email onwards as soon as they can.

  • Blue
  • This area is the core of the service, the mail servers are the servers you connect to when reading your email whilst the mail storage servers are simply huge storage arrays where all the email is stored.
    Nightly backups of email on these arrays are made to 'Mail Backups'.
    'Web Service' is responsible for any web based service in particular the website and webmail whilst 'Database Service' stores everything required by various other stages of the service.

  • Green
  • This final area is the part of the service that handles your webhosting account if you have an applicable hosting subscription.
    'FTP Service' lets you update your website whilst 'Hosting Service' provides website access to your visitors.

    Online Times
    We also show you how long each server has been online for (where data is available) in minutes, hours or (far more likely!) days. This lets you see if a recent problem you think might have been service related was or not by seeing if a server (or block) came back online around the time of your issue.

    Offline Servers
    If you do see a red light you can rest assured we will be informed immediately. Our systems are all fully automated and continually check each other (and from external sources) for availability. The instant a problem is detected that individual server is removed from the cluster and service continues. Servers will have routine maintenance performed on them from time to time and you will always be warned if any maintenance causes outages (but not if it will not affect overall service functionality).

    In addition we sometimes take servers out of the cluster for extended periods of time or new servers that are being prepared for addition to the cluster may appear here before they are actually used. In these cases the server will appear here with a red dot for an extended period of time although there is no need to worry of course as these downtimes are intended.
    News
    3 new in last 7 days

    11/03/2010
    Overnight Storage [Resolved]
    One of the storage servers still hadn't mounted after last nights power breaker failure despite appearing that it had. As a result those of you with email on this server have been unable to access your email from 20:36 last night until 08:15 this morning.


    We sincerely apologise for this further outage, this is the 4th week in a row we've suffered some sort of hardware failure that has affected the service and we're seriously considering all available options. Up until now this simply included new machines built with dual power supplies and feeds but discussions are now including moving to a new datacenter provider as years of trust have been wiped out with the constant stream of failures that have been outside of our control.

    11/03/2010
    Brief Outage [Resolved]
    20:36 - 20:42
    Brief outage caused by 8 servers all appearing to loose power at the same time despite being on battery/generator backups in case of failure. Currently investigating with datacenter the exact cause.


    20:55
    One of the storage servers isn't responding, some of you will be unable to login at the current time as a result. Should have it online in a few minutes.


    20:56
    Storage remounted, service fully online. Still determining original cause of the outage.

    22:57
    Datacenter has confirmed that a faulty power breaker tripped and had to be replaced with an on-site spare. Power was down for under 5 minutes but it took a few further minutes for the servers to boot back up. As you know we have been investigating A/B power feeds to prevent any single power fault from taking down the service.

    04/03/2010
    Datacenter Power Failure [Resolved]
    19:37 - Power circuit failed in the datacenter, knocking out 3 servers. Service was slow as two of the servers were webservers, leaving web access restricted in speed. Client access and inbound email are both unaffected as no mail servers were on the failed circuit.


    20:07 - All affected servers back online, web access stabilising across the webservers. Service may be a little slow for the next 10-15 minutes as users online are redistributed over available webservers.