Documenting downtime

Posted by yosmc, 05-04-2007, 10:30 AM
I'm having a bit of a quabble with my current host because our server was unreachable for an extended period yesterday, and since I made a traceroute that arrived at their datacenter but not at our server, I was assuming it's a crystal clear network issue. Today however they tell me that a traceroute means nothing. To quote them: What do I make of this? First time I've heard that, when the network is down, I need to contact my host to tell me. But without any polemic - is this what it boils down to?

Posted by stephanhughson, 05-04-2007, 04:09 PM
A traceroute failing doesn't necessarily mean something is down, they probably get hundreds of false reports that they are down when they're not, so just need you to gather more evidence for them. Next time it is down, you could check your site using: www.siteuptime.com (they have a quick check feature) www.megaproxy.com and the "tcp query" tool on www.centralops.net If all of these show your site is down, it's a pretty safe bet that it is (at least from part of the world). Combined with traceroute results, it would be pretty good evidence for you to show them. They'll probably be aware of any downtime though through their own monitoring tools, so unless you have an SLA or something and they're not giving you compensation for downtime, there is probably no real need to contact them each time the site is down.

Posted by yosmc, 05-04-2007, 07:04 PM
Thanks for the reply, I'm just not entirely sure if you got the problem. That the site(s) were down is pretty much out of question, but what's at dispute is if it was due to an internal server issue (my fault) or a network issue (their fault). So Siteuptime.com would have definitely seen the site as down, but I don't believe it would have answered the question. I always thought that it's the traceroute that decides it - traceroute goes through when site is unreachable means that it's an internal server thing; traceroute doesn't reach the server means it's a network issue.

Posted by Digital City Host, 05-04-2007, 09:41 PM
I think that stephanhughson is trying to provide you with information to build your case that the server was unreachable, regardless of the issue. It is harder for the tech to discredit your claim when it is based on attempted communications from various physical locations. In their eyes having a trace from your location, and only your location, does not mean that the server is not reachable to the rest of the world. It is harder for them to say the attempted communications with the server from 10 different parts of the world does not mean their is an issue. Once your server is reachable you will have limited options to proove your case. So when it is not reachable proove your case by sites like the previous mentioned and here is another one: http://www.alertra.com Also the more info you can provide the tech the more the tech may be able to help whatever the issue is.

Posted by tsj5j, 05-04-2007, 10:47 PM
A traceroute can be ended due to a firewall blocking packets along the trace, etc. but still be reachable (i.e. network is up).

Posted by stephanhughson, 05-05-2007, 04:07 AM
Who manages the server? Is it a dedicated server managed by you? If so, companies in general tend to be more sceptical as quite often dedicated servers are unmonitored and can go wrong for many reasons that aren't their responsibility to fix. It just depends on what you bought. You can check the logs, or graph them like I do : http://www.crimefightingrabbit.com/system_info.php They might not accept that though. If you are on Linux, you might try checking /var/log/messages (or the older copies, messages.1, messages.2 etc as it might not be today). I would look to see if I could find anywhere where the network interface is going up or down. I'm going on a bit though. Basically, if it's a server managed by them, they will know when it's down, and unless they are trying to trick you, they'll admit when it is. If it's a server managed by you, you need to check to see if you have an SLA (service level agreement) and then ask them how they calculate downtime. If they don't have a way, you could ask what they would accept as proof of downtime, or if they could start monitoring your server. Or, you could move onto a managed service perhaps.

Posted by yosmc, 05-05-2007, 05:12 AM
DCH: Well, in this case they haven't doubted the fact that the server was unreachable, they just said that they don't think it was their network that was at fault. Which is why if I had provided more evidence that the server was unreachable (e.g. as provided by SiteUpTime.com) they would likely still say the same thing. Yep, it's a dedicated server - sorry if I forgot to mention. That's why it makes a difference of course, if the server is down because I messed something up on the server itself, then it's my fault and nobody elses. Trust is fine and all, but if I pay for a service I would still like to be able to verify that I get what I pay for. That's why I find the suggestion by their help desk problematic, because by the time it takes them to reply to my ticket (let's say, 30 minutes) the problem may be gone again, so then they tell me their traceroute comes through just fine - but that doesn't answer the question what the issues were at the time I couldn't reach the server. @tsj5j: Well like I said, in the situation I mentioned the traceroute timed out in their datacenter, so it seems fairly particular that it's not been blocked by some firewall along the way.

Posted by stephanhughson, 05-05-2007, 07:16 AM
It's probably too late for this time, but you could set up a system to stop this happening the next time. I think you need to discuss with them what they would accept as a way to measure downtime/uptime, make sure you are both happy with it, then go ahead with the solution you come up with.

Posted by yosmc, 05-05-2007, 02:30 PM
That's what I'm doing right now, so thanks for the feedback.

Posted by bqinternet, 05-05-2007, 05:27 PM
Your host is correct. The packets might be taking a different route back, so even though the trace stops within the provider's network, the actual problem might be on the return path, which you won't see unless you do a traceroute in the other direction. As suggested by others, try doing traceroutes from multiple locations using a tool such as the one at http://www.traceroute.org/.

Was this answer helpful?

 Print this Article

Also Read

How to show hidden (dot) files by default and...?

Posted by gurika, 09-13-2007, 10:10 AMHello, on my old centos servers I can show hidden (.file)...

copy sql?

Posted by Calibaba, 01-11-2008, 05:08 PMQuick question. How do I copy an entire mysql sql...

How is this possible?

Posted by iUnknown, 09-16-2008, 02:27 PMHello, When I run the following command on my server:...

website stops responding

Posted by netedgetech, 04-21-2009, 07:16 AMHi, I have 3 servers ServerA(Web, mail),...

Wordpress Avatar based on username

Posted by mfwl, 02-10-2011, 05:43 AMI am trying to add a wordpress avatar based on the username,...