Skip to main content

Amazon EC2 Connectivity Failures - 10/4/2014

I have seen indications of periodic connectivity issues impacting Amazon's EC2 Cloud Computing architecture. Personally, I have encountered issues with connecting to Amazon's Yum repository hosts from EC2 instances.

Amazon has published Outage notifications of brief connectivity and DNS failures impacting US-EAST-1 Availability zone between October 2nd and October 4th. However, my EC2 instances are within the US-WEST-2 Availability zone and I am experiencing issues today, October 4th 2014 at approximately 11:30 AM EST.

For example:

# yum provides seinfo
Loaded plugins: amazon-id, rhui-lb

epel/x86_64/filelists_db                                        | 4.7 MB  00:00:01
rhui-REGION-rhel-server-optional/7Server/x86_64/filelists_db    | 3.2 MB  00:00:00

https://rhui2-cds01.us-west-2.aws.ce.redhat.com/pulp/repos//content/dist/rhel/rhui/server/7/7Server/x86_64/os/repodata/e5ee2c196ee6525998525a2bf74bb40608dce199-filelists.sqlite.bz2: [Errno 14] HTTPS Error 404 - Not Found

Trying other mirror.

https://rhui2-cds02.us-west-2.aws.ce.redhat.com/pulp/repos//content/dist/rhel/rhui/server/7/7Server/x86_64/os/repodata/e5ee2c196ee6525998525a2bf74bb40608dce199-filelists.sqlite.bz2: [Errno 14] HTTPS Error 404 - Not Found

Then, 5 minutes later, with absolutely no changes to my server's network or yum configuration:

# host rhui2-cds01.us-west-2.aws.ce.redhat.com
rhui2-cds01.us-west-2.aws.ce.redhat.com has address 50.112.120.15

# yum provides seinfo
Loaded plugins: amazon-id, rhui-lb
setools-console-3.3.7-46.el7.x86_64 : Policy analysis command-line tools for SELinux
Repo        : rhui-REGION-rhel-server-releases
Matched from:
Filename    : /usr/bin/seinfo

I find this extremely frustrating. With my small presence on EC2, I have no ability to troubleshoot what is causing these issues. However, I can confirm that there *are* issues as of today, that Amazon has been aware of connectivity and DNS failures for at least two days, and that Amazon is currently claiming that there are no issues.

This is quickly becoming the industry-standard mode of behavior for Cloud computing providers: wild-eyed, outlandish promises of perfect availability followed by regular connectivity failures that are haphazardly brushed under the rug.

Customers are owed transparency. I remain convinced that the only way to accomplish reliability is by "doing it yourself" and colocating servers in multiple datacenters, implementing and managing redundancy directly. The issue is too important to trust to hosting providers who have consistently demonstrated dishonesty.

See for yourself the almost invisible notice Amazon has posted to customers on their Service Health Dashboard:

Amazon EC2 Buries Connectivity Failure Notifications
Downtime? What Downtime?