Ok, I'm typing this while _____ angry, so please excuse any typos or other nonsensical details.
Twice in the last seven days the server was unreachable for about 20 minutes. In both of these cases it was a network problem at the data center where the server is housed. They have been having some network accounts on a few servers that have been impacted the whole network until they can mitigate the attacks.
Tonight, I got a text alert, so I assumed it was the same thing, as I was told they were doing some configuration changes, but I could see some more of those outages over the next week.
Both my monitoring and the Datacenters monitoring alerted on BF being down, but unfortunately, they didn't take immediate action, even though it is under management with them and they are supposed to. I was in the shower when I got the text message, so it had been offline for about 20 minutes before I knew about it and contacted support, assuming it was another network problem.
After around 30 minutes, the server still wasn't up and I start investigating myself. Earlier in the day, the datacenter was supposed to be running a cable between this server, and a small ATOM server I keep to store backups, in case of a major failure of this server. This cable was hooked to the secondary LAN adapter and they had to setup a private IP address so the two servers could talk to each other and the backups could be transferred over that cable and not impact the site performance each morning when the backups are copied to the backup server.
Anyway, I had never gotten an update on that ticket, so I assumed they put it off until tomorrow, but when this happened, I figured they might have misconfigured the network. I had given VERY specific instructions to check and make sure BroncosForums.com was loading after they configured that secondary LAN adapter, but obviously that wasn't done.
Bottom line, a tech screwed up the configuration of the secondary LAN (more accurately, I believe it was how he restarted the network service on the server), but more importantly, didn't catch it because he failed to follow my instructions to check that BF was loading properly.
Until I pointed out that this was probably what happened, they didn't discover their error, fix it and get BF, but not after it was down for about 50 minutes.
This was a major screw up on the data center's part. They have typically been very reliable and while I am pissed beyond words right now, at this point I'm going to assume this is an anomaly and hopefully we don't suffer this kind of screw up again.
Sorry about the downtime.
T