Feb 022014
 

A lot of providers will sell you 1 Gbps port with oodles of bandwidth. A test file download to boot. For most people that seems to be enough. That is incorrect in most cases. Having good speed and bandwidth is just one piece of the cake you never ate. You can have a lot of bandwidth allocated but if there is bad network connectivity, crappy maintenance, a decade old routers etc you will never be able to reap any of the benefits. Which is why you will see that some dedicated hosts do poorly as opposed to smaller shared hosting served sites when serving end users.

Here’s how to evaluate a new provider where you do not know any existing sites that you can use to test. (And please don’t use the IPs, files as target to test, those are specially setup servers)

First find out the organization name. Here I target “NoUptime“, (fictitious) since they actually have oodles of packet loss for a good example (in a bad way) nothing against them, they are great cheap cost provider.

 

First we need to find an IP that is live with this provider and test against it.

1. Let us go to to RIPE who maintain list of who owns which IPs. it is not always up to date but for our purposes it is quite accurate.

I will visit their Database search

https://apps.db.ripe.net/search/full-text.html

type in NoUptime and you will see some results

on the right side of the page you have “result type” filter. Select “inetnum” i.e. IPv4. You can even try inet6num for IPv6 if thats what you want to test.

The results are updated.

2. Select an IP from the results. You may have to try multiple times to get one which is live. I normally select the first IP in the range given in the results. Try different search result pages, not the first only.

So I get an IP, let’s say 10.2.3.4 (not real IP)

3. Run the MTR

While this is only half the report it is still a good indicator. So let us begin. Get the MTR tool if you haven’t already. Once you get it installed run it on the target IP

 

Example:

#mtr 10.2.3.4 (edited to remove real identification IPs etc)

Host                                    Loss%   Snt   Last   Avg  Best  Wrst StDev
 1. 192.168.0.1                           0.0%   623    0.4   0.5   0.4   1.4   0.1
 2. some.ip.near.you                            0.0%   623    3.7   6.0   3.0 102.9   8.0
 3. some.ip.further                         0.0%   623    4.9   7.7   3.7 210.7  17.8
 4. some.ip.on.isp                          0.0%   623    4.2  10.0   3.9 206.4  24.1
 5. some.ip.isp.isp               0.0%   623   10.2   7.6   4.8  40.6   2.1
 6. init7.net.any2ix.coresite.com         0.0%   623  173.6 171.4 169.6 197.7   1.7
 7. r1nyc1.core.init7.net                 0.0%   623  256.7 255.8 251.4 267.1   3.6
 8. r1lon1.core.init7.net                 0.0%   623  321.7 317.0 312.5 332.3   4.3
 9. r1nue1.core.init7.net                 0.0%   623  335.1 336.0 333.7 347.9   2.6
10. gw-nouptime.init7.net                  0.0%   622  333.5 339.0 332.3 415.5  13.3
11. core12.nouptime.de                    44.4%   622  335.5 335.6 334.2 351.6   1.6
12. core22.nouptime.de                    12.5%   622  336.8 337.3 335.3 360.8   2.9
<span style="color: #ff0000;">13. juniper2.rz13.nouptime.de              10.0%   622  373.3 351.5 328.5 424.0  23.5
14. hos-tr4.ex3k11.rz13.nouptime.de        15.0%   622  331.9 332.4 329.9 343.2   1.6</span>
15. static.4.3.2.10.clients.your-server.de    20.0%   622  337.1 336.9 335.2 346.5   0.9

This above shows the packets sent to the NoUptime IP that hosts a customer’s server and this amount of packet loss is really really bad. The packets lost are retransmitted. If you were on an ip at NoUptime you can even run the same trace from Server to your connection. It is safe to say it will be just as crappy. As per above we observe that almost 50% of the packets are lost in NoUptime at hop # 11. Which means half the data you send will never reach and will have to resent. What’s worse is that at Hop #12 another 25% of the packets that do manage survive Hop # 111 die of unnatural circumstances. so End of the day your 100 MB file will take more than twice as long to upload. Now CAVEAT: Some routers don’t respond to ICMP as claimed by providers, what I do not is why they’d respond to half the packets and not all. In any case what you need to see in the Last hop, in this case #15. Here we see 20% packet loss. This is the REAL loss and that is what matters. Again, the reason I say try with different IPs is because someone may have configured their network wrong like not turning “autoneg on”. Which was my case.

I have observed that on reverse MTR at NoUptime it’s even worse as such Downloads that account for 90% over of all regular website traffic will suffer greatly contributing to End User experience . This provider is consistent enough for me to just decide one day (today) to write about how to test and still get the same sort of lossy MTR. The reason I write this is because packet losses are normal on the internet and do not mean it is same from every location or every day.

To give any provider the benefit of the doubt you need to conduct the test over a couple of days and not continuously. Just get a sample at every few hours and for a few days (2-5 days), try from multiple locations if you have SSh access to remote servers you currently run, try the RIPE for different IP of the provider. Because device or packet loss can be fixed by engineers when detected, normally they fix on their own. If their network maintenance is really crappy and despite customer complaints they do not fix it then now you know. 

they has your packets.

they has your packets.

Do you have any more ideas on how to evaluate a new Hosting Provider? Share in the comments please.

Until next time.

 

EDIT: I have corrected the example. I ran a real traceroute at the time and adjusted the numbers manually for laziness. I have marked the packet loss by hops correctly to explain the scneario. Point is if you see zeroes in between then it is not an issue. It should be loss from one start point to end or -1 hop (in case the destination host is also blocking ICMP)

Also see comments from Chris below

  2 Responses to “Before signing up with the provider, test their network”

  1. There are many issues with your concept that need to be addressed, but the most important is this: Hop 11 shows 44.4% of packets dropped, but this is loss AT the router, this is not an indication of packets lost THROUGH the router.

    MTR sends out equal traces every time. If hop 11 were truly dropping 45% of the packets, then you would see that cascade through the path beyond hop 11. What happens above is that there is a 12% loss at hop 12, and then 0% loss at hops 14 and 15.

    This is because of ICMP deprioritization and how traceroutes work. Traceroute commands work by sending out ICMP packets with steadily increasing TTL (time-to-live) values. Each hop that a packet takes decrements this TTL by one. When the TTL hits zero, the device discards the packet and sends back an ICMP type 11 – time exceeded – packet back to the source. The source notes the address of the device that sent the type 11 packet and records it as a hop. The source then sends out another packet with an incremented TTL to repeat the process.

    Most routers are running remarkably slow CPUs, sub-1GHz. Cisco and Juniper get away with this by designing their equipment such that most tasks don’t touch the CPU, they’re handled in the hardware. Tasks that do require the CPU are important tasks such as managing the routing tables, SNMP monitoring, and generating ICMP type 11 time exceeded responses. Most network admins believe that BGP sessions are more important to a router than responding to ICMP requests, so if the router is too busy handling more important tasks it may not be able to respond quickly enough before the packet itself expires and is dropped.

    Seeing lag or lost packets at a single hop in the middle of a trace is meaningless. Seeing lag or lost packets through a hop, cascading to the end, that is much more useful knowledge. That said, you should still be using tools like ping to determine your latency between critical endpoints (your primary location and your disaster recovery site, or your datacenter location and your primary target audience’s location, for examples) and speed tests where possible (iperf is a great tool, http and ftp will also work).

Leave a Reply