A Beginner’s Guide on How to Use Screaming Frog

How to perform a site crawl on Screaming Frog
Using Screaming Frog to analyse key areas of your website
How to create custom configurations in Screaming Frog
Customising robots.txt files and locating orphan pages

If you are new to search engine optimisation (SEO) and are looking for tools to help search engine bots crawl your website, then this post is for you!

We regularly use Screaming Frog at White Horse Agency to support our technical SEO efforts.

Why?

Screaming Frog is a website crawler that allows you to perform a full technical audit and an inquisitive check of your website. Thanks to its versatility, you can use Screaming Frog for other uses, including:

Competitor analysis
Outreach
Verifying schema markup

Various SEO experts use Screaming Frog. However, this tool can be a challenge if you’re new to the software. We’re going to explore a few of our favourite Screaming Frog features and detail how you could use them to get the most out of this SEO Spider tool.

But first, let’s break down the user interface.

Getting Started with Screaming Frog

After installing Screaming Frog, we advise familiarising yourself with the menu, options and settings to help you navigate the platform.

File

The first control element is the “File” option, which will house your last six crawls and allow you to set default settings for the software.

Configuration

The “Configuration” control element allows you to set and adjust custom settings, such as excluding specific URLs from the crawl and integrating Google Analytics or Google Search Console into crawls.

Bulk Export

As the name suggests, this option allows you to export multiple URLs.

Reports

The “Reports” control element creates downloadable crawl overviews and data reports.

Sitemaps

You can construct a website sitemap in this control element. Screaming Frog offers various sitemap options, so it’s great for large websites with a complicated site structure.

Visualisations

Screaming Frog has two visualisation types – a directory tree visualisation and crawl visualisation.

While visualisations don’t offer the best way to diagnose issues, they can help provide perspective and highlight underlying patterns in data that may be difficult to identify using traditional reporting methods.

How to Crawl Your Site

When performing a crawl in Screaming Frog, the software defaults to Spider mode and conducts the audit according to the configurations and filters created by yourself.

You perform an audit by entering the URL into the search bar near the top of the software and clicking “Start”. Alternatively, you could upload your sitemap by changing the “Mode” to “List”, which will instruct the platform to crawl links contained within this blueprint of the website.

Now that we’ve covered how to get started with Screaming Frog let’s delve into how you can use the software to support SEO tasks.

Backlink Analysis

You can use Screaming Frog to identify low-quality backlinks on your website, which you might need to disavow.

Knowing the quality of your backlinks is extremely important, especially if you’re comparing your website performance to competitors and assessing a new client to see how the website is faring.

You can integrate other analytical tools such as Ahrefs, Google Analytics, and Majestic with the SEO spider software. All you’ll need is your API code to connect the accounts.

Using the Majestic SEO tool as an example, which has a powerful backlink tracking ability, we’re going to show you how to use it in conjunction with Screaming Frog.

After connecting the two accounts and performing a site audit, Screaming Frog will return its usual data, and link metrics gathered from Majestic, which will show in the “Link Metrics” tab.

If you’re analysing your own data, you can use backlink metrics to assess your performance against competitors, paying particular attention to the differences in engagement levels across your top-performing content.

You can also look deeper into your internal and external links to see how they’re being used, where they link to and if page authority is being passed down to lower-level pages in your sitemap.

Analysing Images

If your images aren’t correctly optimised, this could lead to a slow page loading speed when users try to access your web pages.

Use Screaming Frog to determine your image sizes and identify any that may slow response times. You can also use Screaming Frog to review ALT text and image display issues for whatever reason.

To find data on your images, go to the “Images” tab, where you can filter by size or other factors to go through the ones that may be causing issues on your site and optimise accordingly.

Custom Extractions with Regex, CSS and XPath

Screaming Frog scrapes essential information about your website by default. However, if you’re looking for something specific, there are two features you can use to conduct an advanced site crawl and analysis: Custom Searches and Custom Extractions.

You can source the Custom Search feature in the Configuration element in the menu. It will allow you to find a preferred line of text within the source code of your web pages. For example, suppose you own an eCommerce website; this feature could help you identify which of your products are “Out of Stock” and whether the web page is still needed or needs to be removed.

On the other hand, the Custom Extractions feature, which is located under the same Configuration element, collects data from the HTML source via three paths:

XPath

XPath, an abbreviation of XML Path Language, extracts HTML elements of a web page, meaning any information contained in a div, span, p and heading tag.

Google Chrome has also made it easier to export XPath. Right-click on the code within the Inspection tool and go to Copy XPath. While you might need to tweak the syntax, you can paste the data into Screaming Frog to perform the extraction.

CSS Path

You can also scrape data from your site using a CSS path, which uses patterns to select elements, with the option of adding an attribute field. This is probably the quickest option for extracting data out of the three methods.

Regex

Regex is a unique string of text used for defining data patterns. You can also use Regex to extract schema markup in JSON-LSD format and tracking scripts; however, as it is pretty complex, it’s best suited for more advanced users.

Creating Custom Configurations

If you want to scrape specific information, you might need to set custom configurations to perform an audit in your preferred way. Before the latest update, you would have to add your custom configuration settings each time you wanted to switch between crawls on different sites.

Now you can save your custom configuration profiles in Screaming Frog. All you have to do is go to File > Configuration > Save As.

You can save an unlimited number of these and even share them with other users – useful if other people in your team need to access these settings.

Customising Robots.txt Files

Robots.txt files is a text file web admins created to instruct search bots how to crawl URLs, specifically, which links it should and should not crawl.

The robots.txt file plays a significant role in the overall management of a website. For example, failure to properly manage disallow entries, which is the rule that tells crawl bots not to visit a specific URL, could prevent critical sections of the site from being crawled.

Screaming Frog allows you to run a site audit and ignore the robots.txt file, to help you identify whether key content is being blocked from crawls and take appropriate action. It’s also a helpful feature to utilise when setting up a new site or performing a site migration.

To set this option, go to Configuration > Robots.txt > Settings.

You can also customise and create your own rules for the current robots.txt file of the website within the “Custom” tab of the menu element mentioned above to determine how changes to the file could affect your site.

Running a site audit with the new rule in place will also give you an idea of how search engines would crawl your web pages.

Sourcing Orphan Pages in Sitemap

An orphan page is a page that search bots cannot find via your internal linking structure, meaning users will also have difficulty accessing these web pages.

Orphan pages can occur for several reasons, including:

Old pages unlinked but left as published
Site architecture issues
CMS creating additional URLs as part of page templates
Pages that no longer exist but are being linked to via another website

While a small number of orphan pages isn’t a huge problem, it’s important to create a solid internal linking structure to help Google understand and rank your website better.

It’s incredibly significant for your top-level pages as the more links a page receives, the more important it appears to Google. However, to discover orphan pages in Screaming Frog, additional URL sources are required from sitemaps and other web tools such as Google Search Console or Google Analytics.

Once you’re set, start by performing a website crawl of your site and sitemap, then head to the “Internal” tab and filter the URLs by HTML for export. Make sure you create separate files for URLs found in your site crawl and the sitemap, then export both files into different tabs within a Google Sheets document and remove any duplicates.

Alternatively, suppose you’ve previously configured a “Crawl Analysis”. In that case, the right-hand pane will show an overview of URLs that require attention post-crawl, including Orphan URLs, which you can filter under the Sitemap, Search Console and Analytics tabs to view.