Showing posts with label bing webmaster tools. Show all posts
Showing posts with label bing webmaster tools. Show all posts

Wednesday, January 20, 2016

Microsoft search indexing can be so aggressive that it resembles DoS traffic

As part of my consulting business I have a number of web servers I take care of. This morning, I woke up to receive a particularly crappy message related to one of those servers:

possible DoS attack

Awesome, right? Ever notice how you never get these sorts of messages between the hours of 9 AM and 5 PM, Monday through Friday?

So I tried to SSH into the target server, and was pleased to find I was able to connect. Relieved that this was likely a false alarm, I found this in the Apache logs:

40.77.167.20 - - [19/Jan/2016:19:43:15 -0500] "GET /robots.txt HTTP/1.1" 200 146
40.77.167.20 - - [19/Jan/2016:19:43:15 -0500] "GET /robots.txt HTTP/1.1" 200 146
40.77.167.20 - - [19/Jan/2016:19:43:15 -0500] "GET /robots.txt HTTP/1.1" 200 146
40.77.167.20 - - [19/Jan/2016:19:43:15 -0500] "GET /robots.txt HTTP/1.1" 403 5
40.77.167.20 - - [19/Jan/2016:19:43:15 -0500] "GET /robots.txt HTTP/1.1" 403 5
40.77.167.20 - - [19/Jan/2016:19:43:15 -0500] "GET /css/main.css HTTP/1.1" 403 5

Take a note at the timeframe on these connections: six connections from the same IP address within 1 second, five of which were to the same file. Also note that the initial connections were successful - errors only began occurring because my Apache config blocks suspicious traffic.

You've probably guessed who this IP address belongs to if you read the headline to this article:

NetRange: 40.74.0.0 - 40.125.127.255
NetName: MSFT
Organization: Microsoft Corporation (MSFT)

At first I thought this IP might be part of Microsoft's cloud server system, Azure, or some other product that might be operated by customers. However, that seemed unlikely as this host was going after the robots.txt file and nothing else other than CSS. That is what search engine spiders do. And this IP very much looks like part of Microsoft's search infrastructure:

# host 40.77.167.20
20.167.77.40.in-addr.arpa domain name pointer msnbot-40-77-167-20.search.msn.com.
The day after these weird connections, the same Microsoft IP came back with a more normal traffic pattern:

40.77.167.20 - - [20/Jan/2016:06:53:35 -0500] "GET /robots.txt HTTP/1.1" 200 237
40.77.167.20 - - [20/Jan/2016:06:53:36 -0500] "GET /index.html HTTP/1.1" 301 245

A standard installation of mod_evasive would result in a temporary blacklist for this kindof traffic. It is unclear if this behavior was intentional on the part of Microsoft, or if more rapid requests for files can be expected. The people who make their bread and butter spreading SEO gossip seem to agree that connectivity failures & web server 50* errors can have an impact of search engine rankings. However, such reports should be taken as just that - gossip.

Both Google & Bing report errors encountered during site indexing through their Search Console and Webmaster Tools, but I wasn't able to find anything published by either Bing or Google about how such errors impact search engine placement even in vague terms. Hopefully this was a one-time error on Microsoft's part and not part of a new approach to indexing (fingers crossed).

Wednesday, September 3, 2014

Schadenfreude + Irony = Blog Post

So I am looking around in one of Microsoft's websites for web development tips when I come across this:

Bing, blog, Josh Wieder, Microsoft, loop, redirect
D'oh
It's really one of the worst possible places to put one of those.

Saturday, October 6, 2012

Bing Webmaster Tools and Blogger Sitemaps - "The Feed is Empty" Error Fixed

Blogging tips aren't really the focus of my website. However, I recently signed up for Bing's Webmaster Tools and encountered some difficulty in submitting the sitemap for my Blogger website, joshwieder.blogspot.com. It took me a bit of head banging before I figured out how to resolve it, thanks in no small part to the amount of guides on this issue that are just wrong on their face. So, for regular blog readers, this isn't as advanced as a lot of the articles here are, but it turned out to be such a nuisance and so ill-documented that I felt something had to be done.

This guide will assume that you have already added your site to Bing Webmaster Tools and verified domain ownership, both of which work as advertised.

At the time of this writing, Blogger primarily uses Atom 1.0 to publish site feeds - Blogger also relies on Atom for a dynamically generated sitemap. For Google's webmaster tools, I registered the following link, which populates an XML file that contains the first 500 pages of data on my site:

/atom.xml?redirect=false&start-index=1&max-results=500

This worked just fine with Google, so I assumed there would be little issue with Bing. Wrong! No matter what I did, every Blogger sitemap URL that I could find in the documentation and guides ended up throwing an error after Bing crawled it - usually the error I received was, "The Sitemap is Empty". Before the big reveal, I am going to do a quick rundown of disinformation I have received on the internet  on how to add Blogger sitemaps:

-Submit a sitemap through a URL like this: http://www.bing.com/webmaster/ping.aspx?siteMap=http%3A%2F%2Fjoshwieder.blogspot.com%2Fatom.xml%3Fredirect%3Dfalse%26start-index%3D1%26max-results%3D500
-Create a custom robots.txt file with a site map flag for /atom.xml?redirect=false&start-index=1&max-results=500
-Use feeds/posts/default instead of /atom.xml. This gets bonus points for at least conceivably being able to work and not just a different way of configuring exactly the same setting. It still didn't work for me, if I remember correctly this produced an "Unsupported File Format" error as opposed to "The Sitemap is Empty"

While different themes might lead to different conflicts, I am still amazed by how many different guides I found and how wrong all of them were. I am confident this fix will work across multiple themes (at least unless Blogger makes a change).

Time for the big reveal. Login to Bing Webmaster Tools. Assuming again that your site has been added and domain ownership verified, it should look a bit like this:


Click the Add a Sitemap link in the bottom right hand corner. Or, from the left hand side menu under Dashboard, expand Configure my Site and click Sitemaps.

The syntax that worked for me is http://yourblogurlhere.ext/sitemap.xml. So for joshwieder.blogspot.com, I put http://joshwieder.blogspot.com/sitemap.xml

You will notice that adding the sitemap does nothing immediately. If you are in a hurry, like I tend to be, use the Bing URL Submission Tool to have the sitemap verified right away. Although Bing will still take its time to compile statistics and spider your site, this will at least show you whether the sitemap is valid or not right away.

Here is what the URL Submission Tool looks like:

Using the link above should allow you to add a Blogger sitemap to Bing without errors as of this writing. Should anything change, or if you use Blogger and this still does not work for you, please leave a comment or send me an email as I would be interested in finding out why. I still don't understand why this was such a hassle - for the conspiratorial readers: Is Google using some sort of secret Atom formatting to make other search engine submission tools look broken and awful? Is Bing secretly on the lookout for Google-sponsored hosting sites and giving them a hard time to make Google hosting look broken and awful? Or is this functionality just broken and awful, with no conspiracy behind it? You decide!

NSA Leak Bust Points to State Surveillance Deal with Printing Firms

Earlier this week a young government contractor named Reality Winner was accused by police of leaking an internal NSA document to news outle...