Amazon has published Outage notifications of brief connectivity and DNS failures impacting US-EAST-1 Availability zone between October 2nd and October 4th. However, my EC2 instances are within the US-WEST-2 Availability zone and I am experiencing issues today, October 4th 2014 at approximately 11:30 AM EST.
# yum provides seinfo Loaded plugins: amazon-id, rhui-lb epel/x86_64/filelists_db | 4.7 MB 00:00:01 rhui-REGION-rhel-server-optional/7Server/x86_64/filelists_db | 3.2 MB 00:00:00 https://rhui2-cds01.us-west-2.aws.ce.redhat.com/pulp/repos//content/dist/rhel/rhui/server/7/7Server/x86_64/os/repodata/e5ee2c196ee6525998525a2bf74bb40608dce199-filelists.sqlite.bz2: [Errno 14] HTTPS Error 404 - Not Found Trying other mirror. https://rhui2-cds02.us-west-2.aws.ce.redhat.com/pulp/repos//content/dist/rhel/rhui/server/7/7Server/x86_64/os/repodata/e5ee2c196ee6525998525a2bf74bb40608dce199-filelists.sqlite.bz2: [Errno 14] HTTPS Error 404 - Not Found
Then, 5 minutes later, with absolutely no changes to my server's network or yum configuration:
# host rhui2-cds01.us-west-2.aws.ce.redhat.com rhui2-cds01.us-west-2.aws.ce.redhat.com has address 18.104.22.168
# yum provides seinfoLoaded plugins: amazon-id, rhui-lbsetools-console-3.3.7-46.el7.x86_64 : Policy analysis command-line tools for SELinuxRepo : rhui-REGION-rhel-server-releasesMatched from:Filename : /usr/bin/seinfo
I find this extremely frustrating. With my small presence on EC2, I have no ability to troubleshoot what is causing these issues. However, I can confirm that there *are* issues as of today, that Amazon has been aware of connectivity and DNS failures for at least two days, and that Amazon is currently claiming that there are no issues.
This is quickly becoming the industry-standard mode of behavior for Cloud computing providers: wild-eyed, outlandish promises of perfect availability followed by regular connectivity failures that are haphazardly brushed under the rug.
Customers are owed transparency. I remain convinced that the only way to accomplish reliability is by "doing it yourself" and colocating servers in multiple datacenters, implementing and managing redundancy directly. The issue is too important to trust to hosting providers who have consistently demonstrated dishonesty.
See for yourself the almost invisible notice Amazon has posted to customers on their Service Health Dashboard:
|Downtime? What Downtime?|