June 2018 Newsletter
Summary
Read about the latest improvements to our indexing system, changes affecting Instagram image search, and why you want rel=canonical links.
Indexing Transition Update
Relevance Improvements
We’ve been busy and have introduced some changes to our relevancy ranking that we didn’t want to save until the June Release Notes next month. To start, we were leveraging the default Elasticsearch/Lucene relevance, but we find that for our customers we need to consider more metadata elements in order to surface the most generally useful items.
Live Now! Page date. Fresher content is more relevant most of the time. When a file has a Created date 30 days or older, it will begin to be demoted. The older it is, the more it will be demoted. Pages without dates will not be demoted.
Coming next week: File type. We’ve had feedback that PDFs and other static files clog up the Page 1 results for some sites on our new index. Non-HTML format files (.pdf, .xlsx, etc.) will be demoted, so that HTML format pages with similar relevance scores will bubble to the top.
Indexing Redirected Pages
Best practice is to include only “real” URLs on an XML sitemap. Sometimes sitemap formatting or contents don’t get updated when changes happen on a site, and the URLs on the sitemap redirect to other URLs. This includes the transition from HTTP to HTTPS - different protocols result in technically different URLs.
When our system encounters a link that has a server-side 301 redirect, we will follow it and index the target URL.
However, if the redirect is a client-side redirect, implemented by the content management system or other means, we do not follow the redirect at this time. We’re working on following client-side redirects, but until we have this in place, we may not have the URLs you actually want us to have.
This brings us back to best practice, which is to add to your sitemap only the URLs you want indexed. If you don’t want a given URL indexed, it would be cleanest to leave it off your sitemap.</td>
Instagram Image Search
A change in policy at Instagram affects our ability to display thumbnail images
Summary: Instagram images will not be searchable for a few weeks while we reconfigure our connection to their system. Flickr and MRSS image search will continue to function as expected.
Last week, Instagram changed their policy on how apps can use thumbnail images. They published a message saying that if an app was broken, this was likely why. Indeed, our app isn’t able to display Instagram thumbnails currently. They also changed their API last year, and now require account authentication from an app to grant explicit access to an account’s image. As a result, we were no longer able to index new Instagram content, though we’ve continued to support search for the images we had indexed prior to the API change.
The change last week means that we will need to re-work our integration in order to display their thumbnails. While we consider this change, we will be disabling Instagram search in our system, so that your searchers don’t get broken thumbnail images in the results. Images from Flickr or MRSS will continue to be searchable.
Questions? Reach out - if you’re interested in an open call to discuss Instagram indexing generally, let us know that too.
SEO Pro Tip
Use rel=canonical
links if you have multiple URL formats for the same content
Because multiple URLs can serve up the same or similar content, canonical URLs are an important piece to consider to improve your site’s SEO and, ultimately, your visitors’ search experience.
A "canonical link" refers to an HTML <link>
element, with a rel="canonical" attribute
, found in the <head>
of your webpage. It specifies to search engines your preferred URL for that page.
For example:
https://example.gov/topics
vs http://www.example.gov/topics.html
vs https://m.example.gov/topics
vs https://www.example.gov/topics?sort=desc
All of these URLs could take your visitor to the same page, but for a better search experience, we want only one of them appearing in the search results. Pick one version that is the “true” version and list that in the head of the page as the canonical link. By doing this, if the page shows up for a crawler at any other URL version, the search engine will only keep the one true URL. Read more and see examples:
SEO Best Practices for Canonical URLs + the Rel=Canonical Tag (Moz)
- Consolidate duplicate URLs (Google)
Webinar Recordings
If you missed our training webinars in May, we have recordings of the sessions available. Both are about 1 hour long:
How Search Engines Index Your Websites
Release Notes
Want to learn about the latest features, fixes, and focuses of the Search team? We post monthly Release Notes on our website.