Website issues – man down, hands tied!
At present Ackadia is having serious issues with connectivity and it’s a tad frustrating, and annoying, that there is little to nothing I can do about the situation.
[ Now fixed, but read on. ]
Well, I could change hosting companies – again – but as I’ve paid in advance for this server and as, hopefully, the issue will be resolved before such a move can propagate, seems a bit pointless, but it will be remembered when the time to renew (or move) comes up again. Next time, I think, I’ll just go with wordpress.com and save myself a load of money in the process. I won’t be able to host for others, but that’s not a problem as the few I do are freebies and the cost of keeping Ackadia up where it is, especially given the few visitors I get, is just stupid.
Anyway, the problem.
I was getting loads of page down errors which, I quickly determined, are not from my side, but server related. For instance I can log into the root and WHM/cpanel console, but get SSL validation warnings all the way (which don’t apply to my live domain). Having got logged in as an admin, any attempt to make changes can either take several days longer than normal to take effect (i.e. removing a redirect), or hangs (soft reset), or times out completely (graceful reboot).
So, I called support (web chat) and asked for a hard (manual) reset and was told, “Looks fine to me”. So I called again and said, it’s not fine, reboot it, please, at which point it was passed up the line to a more senior tech who said, yep, “Looks fine to me”. I didn’t get ratty, but I did give up with that chat support and put as ticket in as it’s harder for them to fob me off.
The reply ignored all the data I’d send to chat support and in the first ticket cheerily pretty much said, “Looks fine to me”, do let us know if you are still having issues. Have a nice day.
Given this is an expensive dedicated server, with 4 additional layers of security and backup, and not some free or Â£1.99 shared job, it was and is messing with my moods. I was more like, “No, John, it’s not fine, we have a problem. Actually YOU have a problem and are telling me you don’t. Look again!” Though I manage to be polite, without exploding.
So, after several day of reports like the following, coming in from friends, Worpdress, Sitelock, Sucuri, etc, I start to get a bit snotty.
Typical replies were of the form:
I am showing the server is online and accepting requests, can you please clarify the current issues you are noticing ?
Is there a specific url you are having some issues with this morning ?
We look forward to hearing back from you on this.
Linux Systems Administrator
Everyone and their brother tells me Ackadia has a problem – and still Hostgator is in denial:
From me, to them:
Sorry, but this is getting annoying now, I can barely access my own domain and traffic – such as it was – has trickled to only a few hits a day, all of which are me trying to log in and write a post slagging Hostgator off because…
Actually I just wanted to link a few posts for an academic paper on web design but it became too frustrating.
This is a dedicated server, backed up by professional rate support from Sucuri, Sitelock, and others. It costs me a relative fortune for what is a vastly over-specified indulgence which – I will point out – I only changed to because hackers got into the VPS server I had with you and you were wholly unable to keep them out, even when you put a fresh image to the server, which frankly bothered me no end at the time. Now you clearly have a network problem that is taking far, far to long to diagnose and fix. And that after me contacting support several times and putting several tickets in on top, each time, initially, being told there is no problem when clearly there was and still is. This does not inspire confidence!
As a casual blogger with very low traffic irregular posts and no advertising, it’s not a major concern. As a manic depressive and OCPD technie, it is not helpful. More to the point, if I was running as web design business (which at one point I considered), we would have a problem. My clients would be giving me no end of grief and while trying to placate them I would be taking a very, very close look at the T&C of our contract, particularly with respect to uptime, or lack thereof.
So, what’s going on? Where are we up to? (And by ‘we’ I mean ‘you’!)
Until, eventually they admit it might be their end:
We have diagnosed a network issue that is present in the data center and have relayed the information to the network operations team. We are awaiting further word back and will advise as soon as we have an update.
We appreciate your patience and can assure we are working to resolve this.
Linux Systems Administrator
â€¦ and more days pass â€¦
Our network team is still investigating but the site loaded for us without issues. Are you still having issues?
At this time we are still actively working towards a resolution in regards to network related issue. Thank you for your patience thus far. We shall provide you with additional information as it becomes available.
You get the idea! Eventually, we get to this:
My name is Alan, and I am a Customer Service Manager here at HostGator. Due to the concerns raised, this ticket has been escalated to my department for further review. We work directly with our senior management and every department here to ensure the excellence of our customers’ experience, and it is my goal to address any issues and concerns you may have.
I’m so sorry for the frustrations that this event has caused you. I have taken the time to review this ongoing support issue and would like to go over what I have found. Currently our upper tiered Linux admins have put in a ticket with our data center and both management and the admins that have worked on this issue are watching the issue for any updates. The issue requires the involvement of our Network Engineers at the data center which is where we are at with this issue. Unfortunately no further updates from them have been given to us at this time, though work remains ongoing and we continue to hammer away through the discovery and resolution process. Honestly, I do wish that I had more information to give you regarding this situation. We certainly understand how important resolving this issue is and I would like to assure you that we appreciate the urgency in this matter and are working with our Data Center Team to improve the situation.
This ticket will remain assigned to our Customer Service Management team, so please feel free to get in touch any time we may offer assistance. Please let us know if there are any questions or concerns that I may address further.
Customer Service Manager
So I waited, and waited, and waited a bit more. Then I stopped waiting:
*smiles sweetly* (hint: this is not a good sign…)
So, another week passes – and no news, no update, nothing. That’s a few weeks it’s been like this.
My server – my main web site – is still down more than it is up, as are the other domains I host. Embarrassing, but they haven’t noticed, yet, so I’m not saying anything… (!)
Minor detail, trivial, hate to grumble and all that, but just to remind you… I am paying $220 a month for a LIVE server… (plus extra for SSL etc)
Glancing at your website, I cannot help but notice “Award-Winning Support – 24/7/365 Server Monitoring”
Just to remind you: prior to this ticket, and included in the log of this ticket, I contacted chat support a few times to report problems and downtimes and was repeatedly told, “Nope, we can’t see anything wrong here!”
And yes, I accept you now admit you have problem there. (One that you – one of the biggest hosts on the planet – appear to be unable to sort.)
And again, as I read (and feel free to detect a hint of sarcasm here), I pride myself in having chosen a dedicated server with a company that boasts: “Top Of The Line Network – Fully Redundant Network with NO Single Point of Failure”
Umm, insy, weensy, tiny problem here, maybe. Again, hate to bother, but if there is no single point of failure – and my site is still, as of this moment down – well, it raises questions.
I would comment on uptime/downdown and SLA’s but your T&S are amazingly vague (but appear to be a lot more robust from the point of view of ToS, hmm)
Now I’ve been a computer consultant and, really, I appreciate you do get issues that you can only stare at and think “WTF? Why isn’t this working…” Or 10,000 lines of code, killed by a missing semicolon. We’ve all been there, but again, I have this major OCPD thing going on, so when I want to go to my blog to look up something, and I can’t, it does unpleasant things to my moods and blood pressure. I do not handle frustration well, all the anger goes inwards (which in not healthy), and spills out the sides as sarcasm (you may have noticed?)
So, in my best British accent, “WHAT THE BLOODY ‘ELL IS GOIN’ ON?”
Talk of compensation would be nice, but really, $100 here or there, in the great scheme of things, is nothing. My blood pressure jumping to 160/105 because you cannot fix a network issue is another matter. So, where are we up to? How long before you can say? And if you cannot, how do I go about cancelled the contract and getting a full refund for the balance – bearing in mind I paid, in full, 3 years in advance (i.e. about $10,000, including discounts).
I do seem to recall getting an email a few years back, you were moving your datacenter. The transfer was so smooth I never even noticed any downtime, despite being my active in writing at the time. Now?
So, update, please…
I appreciate your uptime rules ignore outside and third party stats, but just as a reminder, if you are logging inside the network and the issue is, as it appears, at some bridge between inside your robust network and the rest of the world, while you might happily count 100% uptime, I – and the rest of the world – are seeing this below: It’s not that it’s a dead server, it’s just a major issue to connect and, you yourself will appreciate, in this fast world we live in, if you get warning messages and have to refresh a dozen times to get a page to load… not going to happen. For me, trying to write an entry, or access post for academic papers, it’s extremely stress inducing. I want it sorted!
FINALLY, they get the message, and I am not sure if it was because I asked for a refund, or because I told them where to look! (i.e. at the bridge level).
It appears that the network issues were caused by someone adding blackholes to the routing tables including your own IP:
[email protected] [~]# history | grep route
534 2017-01-24 03:03:00 route add 126.96.36.199 reject
535 2017-01-24 03:03:20 route add 188.8.131.52 reject
536 2017-01-24 03:03:34 route add 184.108.40.206 reject
537 2017-01-24 03:03:52 route add 220.127.116.11 reject
538 2017-01-24 03:04:11 route add 18.104.22.168 reject
540 2017-01-24 03:09:51 route add 22.214.171.124 reject
541 2017-01-24 03:10:09 route add 126.96.36.199 reject
542 2017-01-24 03:12:19 route add 188.8.131.52 reject
543 2017-01-24 03:12:36 route add 184.108.40.206 reject
544 2017-01-24 03:12:55 route add 220.127.116.11 reject
545 2017-01-24 03:13:15 route add 18.104.22.168 reject
546 2017-01-24 03:13:40 route add 22.214.171.124 reject
547 2017-01-24 03:13:59 route add 126.96.36.199 reject
549 2017-01-24 03:24:03 route add 188.8.131.52 reject
550 2017-01-24 03:24:17 route add 184.108.40.206 reject
551 2017-01-24 03:24:32 route add 220.127.116.11 reject
552 2017-01-24 03:24:51 route add 18.104.22.168 reject
553 2017-01-24 03:27:23 route add 22.214.171.124 reject
554 2017-01-24 03:27:42 route add 126.96.36.199 reject
After removal of the blackholes that were performed. The ping to your IP finishes without any issues.
[email protected] [~]# ping (ip)
PING (ip) 56(84) bytes of data.
64 bytes from (ip): icmp_seq=1 ttl=49 time=38.9 ms
64 bytes from (ip): icmp_seq=2 ttl=49 time=36.5 ms
64 bytes from (ip): icmp_seq=3 ttl=49 time=35.0 ms
— (ip) ping statistics —
11 packets transmitted, 11 received, 0% packet loss, time 10815ms
rtt min/avg/max/mdev = 34.194/37.716/40.843/2.243 ms
The graph provided shows visitor counts but I’m not seeing anything in regards to network connectivity nor am I seeing anything on the server indicating issues.
Linux Systems Administrator
Austin, TX US
So, basically, the whole problem was that THEY blacklisted me! It remains to be seen how far their blacklisted extended. BUT, a blacklist is a blacklist, a bloody great barrier – refreshing the screen a dozen times or 10,000 times should not bypass it. And, it wasn’t just my ip, it was every ip bouncing away. A blacklist would not explain a console hanging either. Rather makes me wonder what the real issue was.
Sorted now, for me at least, but as the saying goes, lesson learnt!
As an addenda, my Sucuri logs shows a clear picture of the downtime.
Overall it was down for almost a month – despite over a dozen calls to support – a far cry from the 99.9% you should expect, hmmm.
Now, here’s the thing, and it bears considering if you are thinking of getting a server with them, Hostgator’s agreements do NOT recognise outside logs. If 7 billion people say your web site if down – and they say it looks fine to them, then, apparently, your web site is not down, at least as far as recompense goes. Put another way, if you are a web designer hosting domains for 100 clients and it goes down for a month, if you fight hard enough you might get a refund for that months rent. That’s itâ€¦
Third party monitoring service reports may not be used for justification due to a variety of factors including the monitor’s network capacity/transit availability. â€¦ The 99.9% up time guarantee only applies to shared/reseller solutions. Dedicated and VPS servers are covered by a network guarantee in which the credit is prorated for the amount of time the server is down, which is not related to our uptime guarantee.