Skip to main content
U.S. flag

An official website of the United States government

Dot gov

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Https

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

How to improve PDF discoverability

You may have heard that PDFs are not the ideal document type for driving Search Engine Optimization. This is because much of the SEO value for any file is derived from metadata inserted into the file. And for PDFs, this metadata must be created in each document file using a program such as Adobe Acrobat, and is a step that many often overlook. Non-HTML documents also fall lower in our ranking algorithm than HTML documents. However, if you do have PDFs on your website, we have some tips to improve their discoverability.

Similar to web pages, Search.gov relies on structured metadata in your PDF files to present them in search results. By following these suggestions in preparing your PDF files, you will improve the quality of the data in our index and the file’s ability to appear in the results rankings.

Screenshot showing the Document Properties of a PDF. Editable fields include Title, Author, Subject, and Keywords. It also displays the file name, and the created and modified dates of the document.

Choose a descriptive file name:

Example:

file-title-or-form-name-and-number.pdf

Detail: Similar to a title, a descriptive file name makes file content clear when a user downloads the file. We strongly recommend you do not use the default file name suggested by your scanner or PDF program, and instead insert the document title into the file properties using Adobe Acrobat or other PDF program. If a title is not set in the PDF file properties, the file name will appear in the search results page in place of the title. It’s best practice to use hyphens to separate words, rather than underscores. Avoid using space characters and these uncommon characters in your file names.

Used In: Query matching, term frequency matching, and if the title tag is absent, the file name will be presented as the search result title.

Read More:

Ensure text in the document is searchable:

Detail: Search engines don’t provide readability for image-only PDFs. Many PDFs now are created digitally, and the text is embedded in the file. however, if a PDF is created through a scan it often will be an image without embedded text, meaning the content cannot used to help find the file in search. Run all scanned PDFs through Optical Character Recognition (OCR) to convert from an image to fully searchable text. For guides on running OCR we suggest How to search a PDF (instructions) or How to Create a Searchable PDF File (video)

Used In: Query matching, term frequency matching

Add a title

Example:

Title: Unique title of the PDF file

Detail: The title should be the unique, document-specific title of the PDF. This is used by Search.gov similar to the HTML title tag and displayed in the list of search results. If the title field is left blank in the PDF properties, the file name will be displayed. You can add a title to your file by updating the file properties in a program such as Adobe Acrobat.

Used In: Query matching, term frequency matching

Add a description (in Adobe labeled as ‘Subject’)

Example:

Subject: A description of the PDF’s content. This is a great place to use synonyms to achieve plain language and SEO keyword stuffing. Aim for 160 characters or fewer.

Detail: The description should be a well crafted, plain language summary of the particular unique file. This will often be used by search engines in place of a page snippet. Include all your relevant keywords you want the page to rank well for. Ideally, limit your description to 160 characters to prevent it being truncated on the search results page. This can be added through updating the file properties in a program such as Adobe Acrobat. (Note that in Adobe>Properties the description field is labeled ‘Subject’).

Used In: Query matching, term frequency matching

Add keywords

Example:

Keywords: Relevant Keyword, Applicable Keyword, Pertinent Keyword, Related Keyword

Detail: List the terms the public would use to find this document. This can be added through updating the file properties in a program such as Adobe Acrobat. Separate keywords using a comma. Both commas and semicolons are supported by Adobe, but our system currently only supports commas.

Used In: Query matching, term frequency matching

Declare the file language

Example:

Language: English

Detail: When a file language isn’t set, the Search.gov system does its best to analyze the content and make a determination. Typically this is not a problem, but if the file wasn’t run through an OCR and all it finds is an image file name or in an old scan where many letters were not correctly identified by the OCR, then our system may decide the incorrect language. Search.gov advises setting the language to avoid any issues, which is often an optional setting when running files through the OCR.

Used In: Used during indexing

Create HTML landing pages for your PDFs

Detail: If you are specifically looking to direct traffic to PDFs, you may consider creating an HTML landing page that is SEO-optimized using traditional semantic metadata. You can find more information on semantic metadata in the Search.gov help manual. You could also choose to index the landing page exclusively rather than index and update your PDFs with document metadata.

A note about date metadata

Detail: If you view the properties of a PDF, you will notice that date fields are not easily modified in the same way that the title, description (“subject”), and keywords are. The dates associated with PDFs include the Created and Modified dates - Created dates reflecting the time the PDF was originally produced and the Modified date reflecting the last time changes were saved to a document.

Used In: Ranking fresher content higher than older content in the results.

Resources

Adobe has information about editing document metadata through xml files for PDF documents created in Acrobat 5.0 or later, and you can update the Modified date to your current date and time by re-saving your PDF file, but there is no simple way to edit these fields freely. If you are interested in having more control over dates affiliated with your PDF files, one approach to consider is using an RSS feed in your search.gov Admin Center to feed PDFs into your indexed items.

We encourage you to read more on on how to improve PDF content for search at the following links: