TF-IDF SEO – everything you need to know

By Justin Lester

11 minutes to read

Join our smart marketers who get our best digital marketing insights,
strategies and tips deliverered straight to their inbox.

Name(Required)
TF-IDF SEO – everything you need to know

By Justin Lester

11 minutes to read

Join our smart marketers who get our best digital marketing insights,
strategies and tips deliverered straight to their inbox.

Name(Required)

[et_pb_section fb_built=”1″ admin_label=”section” _builder_version=”4.23.1″ global_colors_info=”{}”][et_pb_row admin_label=”row” _builder_version=”4.16″ background_size=”initial” background_position=”top_left” background_repeat=”repeat” global_colors_info=”{}”][et_pb_column type=”4_4″ _builder_version=”4.16″ custom_padding=”|||” global_colors_info=”{}” custom_padding__hover=”|||”][et_pb_text admin_label=”Text” _builder_version=”4.23.1″ background_size=”initial” background_position=”top_left” background_repeat=”repeat” global_colors_info=”{}”]

Discover whether this little-known tool can help you outrank your competition.

One of the prerequisites for working in the field of SEO is a drive for innovation – in fact, not a day goes by where we’re not looking for new ways to unravel the mysteries of Google and dominate in SERPs.

We constantly search for new, innovative ideas, and sometimes find ideas that have been around for awhile that are not talked about much at all.

Today, we’re going to be exploring TF-IDF SEO, an older yet little-known topic in the SEO world.

We’ll start by introducing you to the concept, going through the basics and how it relates to SEO. Then we’ll look at whether Google uses TF-IDF, as well as at its benefits.

Thereafter, we’ll move on to explaining how it works, and what TF-IDF tools you can look at.

Finally, we will explore whether TF-IDF can actually help you with your rankings, what alternatives are available, and our final verdict on the topic.

What TF-IDF is

Understanding the basics

TF-IDF is a statistic that measures the importance of a word in a body of text when compared to a larger collection of other documents. If a word appears many times in a document, that word becomes more important. But when that word also appears frequently in other documents, it loses importance.

TF-IDF stands for Term Frequency Inverse Document Frequency. It can also be written as TF*IDF, meaning that in this statistic, the Term Frequency is multiplied by the Inverse Document Frequency.

  • Term Frequency – how frequently a word appears on a particular page
  • Inverse Document Frequency – how often that word appears on all other pages in that database

So in simpler terms, TF-IDF analyzes which words appear most often on a page after taking out common words like ‘a,’ ‘an’ and ‘the.’

These words are so prevalent across all of the documents and pieces of writing on the internet that they don’t give an accurate indication of what a specific page is about.

Once these unimportant words are removed, we’re left with the unique words that reliably explain what the topic of a document is (or at least – we have a general idea).

Simplified TF-IDF in action

To visualise this in an overly-simplified way, we can use a website page word counter, which will tell us what words appear most often on a webpage. This is the Term Frequency component of TF-IDF.

These are the results from an online furniture store:

  • The ‘All Keywords’ column shows us which words appear most frequently on the page. If all we had was this information, and we had never seen the webpage before, we’d have no idea what it was about. All we could say is that it had something to do with SEO.
  • The ‘Non-Common Keywords’ column reveals the unique terms that are present on the page. From this full list of unique keywords alone, we’re able to surmise that the page is about furniture, even though we haven’t actually read the page.

Clearly, TF-IDF can help you understand what a piece of text is about, even without reading the text first. Or, as is more often the case, without being able to understand the text like a human.

Can you guess what one of the main uses for TF-IDF is?

Automated text analysis and machine learning.

SEO and TF-IDF

TF-IDF is one of the older metrics used by search engines – but that doesn’t mean it’s not still relevant.

Search engines can use the system (or their own versions of it) alongside other metrics to accurately understand the topic of a particular webpage.

And for SEOs, TF-IDF can be used to ensure that our webpages are accurately conveying their topic and purpose to search engines.

But more than just being website page word counters, real TF-IDF SEO tools go even further by telling us the most important unique keywords on our competitor’s websites.

Does Google use TF-IDF in its algorithm?

TF-IDF has been used as a metric by Google for a long time. In addition to several other metrics, Google does use TF-IDF for information retrieval and determining what the most important words on a webpage are.

However, as Google has evolved over the years and with the addition of updates like Hummingbird, Google has gotten better at understanding things like search intent and content relevance.

This means that, although the statistic is still in use, Google is not relying on it as much as it used to.

The benefits of SEO TF-IDF

SEO TF-IDF is, essentially, competitor analysis. Using the system, you are able to determine what keywords your own content is missing out on by examining the content of your competitors. This allows you to go back and update your content, and hopefully rank better.

In my opinion, TF-IDF is practically useful in two situations:

  1. To determine if you’re using the most important words in your content sufficiently. However, this is quite easy to do naturally so this is usually not something to obsess too much about.
  2. To learn what descriptors top-ranking content in a particular industry uses. The adjectives, nouns and verbs that I would use to write an SEO service page, for example, are a little bit different to the kind I would use to write copy for a hotel. TF-IDF tools can be great to give writing for different niches the right diction, but if you’re an experienced writer then you may not need them.

However, these two uses are really only applicable for TF-IDF SEO tools that return single words. Some of these tools are capable of returning phrases as well, but that leans more towards topic modelling, which is something we will cover later in this article.

How SEO TF-IDF works

Let’s work through an example together.

Say we want to optimize a furniture website. Let’s look at how we would go about doing this using traditional keyword research, and what we would do differently using SEO TF-IDF.

Traditional keyword research

For our standard method of optimization, we would input our focus keyword (‘buy furniture’ in this example) into a tool like Google Ads and get a list of keywords like:

  • Sofa beds for sale
  • Online furniture
  • Buy bed
  • Buy sofa

This is great to get an idea of what other terms customers are using to find furniture for sale, but it doesn’t show us the terms that our competitors are using successfully on their websites that we might be missing out on…

Enter TF-IDF.

The TF-IDF SEO method

This time around, after inputting our focus keyword into a TF-IDF tool, we’re shown words that include things such as:

  • Mattresses
  • Account
  • Delivery
  • Fabric
  • Edition
  • Modern

These are the unique, important words that our competitors are using to successfully rank well for our target keyword.

And Google clearly values their content.

Therefore… we can ensure our own page is optimized with those terms too, and reap similar ranking and traffic rewards too. At least, that’s the idea.

TF-IDF SEO tools

To see SEO TF-IDF firsthand for yourself, and to learn more about how it works, take a look at some of these tools:

These tools are quite easy to use too, and focus on delivering results in 3 steps:

  1. Input your focus keyword and your URL (or text).
  2. Analyze the insights from your competitors’ content.
  3. Optimize your own content and achieve higher rankings.

Check them out, try their demos, and see if you think they can add value to your business.

Can TF-IDF really help you rank better?

By now, you’re probably thinking that TF-IDF is an interesting tool worthy of further investigation. However, not everyone believes in the power of TF-IDF for SEO. Here’s what its detractors say are its main flaws:

Irrelevant keywords and intent

By focussing only on the top pages in search results, TF-IDF SEO tools run the risk of analysing pages that aren’t actually your competitors. They could also be targeting websites that operate in different niches than your own website.

Additionally, the content on these sites could be too lengthy or even too shallow to provide useful comparison with your own content.

They could also be targeting content with the wrong search intent – informative rather than appealing to people looking to purchase a service, for example.

Finally, the list of keywords that TF-IDF tools provide you may also contain irrelevant entries that would not be suitable for inclusion with your content. You will need to manually review the results and choose only those keywords that will actually fit well with your content and its intention.

Originally, TF-IDF was intended for information retrieval only. And while we can use it to optimize content as well, ultimately that was not the end goal for which it was designed.

Sample size limitations

Google’s database for TF-IDF consists of all the pages on the internet that it has indexed. No other SEO tool has access to this database. As a result, the best they can do is use rough estimates, with uncertain accuracy.

In fact, TF-IDF tools frequently only examine the top 10 or 20 pages on Google’s search results.

John Mueller, senior webmaster trends analyst at Google, has also confirmed this point, stating that “you can’t reproduce this metric directly because it’s based on the overall index of all of the content on the web.”

Instead of focussing on TF-IDF, John urges webmasters to focus on providing their users with valuable and useful content that will stand the test of time.

Focus on keyword stuffing

If you’re ranking at position 2 for your target keyword, and you’ve already used it sufficiently in your content, then it’s unlikely that increasing its frequency is going to get you that top spot.

Yet when you’re given a report from a TF-IDF tool that provides your own and your competitors’ content TF-iDF scores, it can become tempting to stuff your content with keywords.

This comes from the mentality that the more times you use your keyword in your content, the better that content will rank.

Sure, when you write about a topic, you’ll naturally use its focus keywords a lot. It helps both people and Google to understand that it is the central focus of the page.

However, one can reach a point of diminishing returns whereby those keywords have been used so frequently that they aren’t contributing to the article or helping the reader. They’re just there for SEO.

But this is outdated SEO methodology, and not the way Google works anymore.

A far better use of TF-IDF would be to find ideas for new, relevant content your copy is missing. Yet in this regard, there are better tools than TF-IDF.

With drawbacks that include keyword stuffing, sample size and keyword relevance, are there any other alternatives to TF-IDF SEO?

Alternatives to TF-IDF SEO

Latent semantic analysis and topic modelling are two such alternatives to TF-IDF that we will explore:

Latent Semantic Indexing (LSI)

In Latent Semantic Indexing, Google uses words that are conceptually related to each other to understand a webpage’s content and overall topic.

I used LSI Graph to see what LSI terms I could find for the keyword ‘buy furniture:’

The most relevant ones included:

  • Best place to buy furniture on a budget
  • Best places to buy cheap furniture online
  • Clearance furniture outlet
  • Factory furniture outlet
  • Online furniture stores free shipping

These LSI keywords are clearly not synonyms for our focus keyword.

But they do relate topically.

Creating content that addressed the above keywords would help Google understand our primary topic – selling furniture to customers – even more.

This is the basic premise of Latent Semantic Indexing.

Topic modelling

Topic modelling is about finding the relationships between words and phrases. It can be thought of as a type of LSI.

When Google is looking for the right webpage to show a user, it isn’t just searching for certain words. It’s looking for words with a particular meaning.

Topic modelling involves creating topic clusters that include your focus topic, related topics and secondary related topics and covering these topic clusters with your content.

What this means, in a nutshell, is that you can’t rely on a single keyword for your content. Instead, you can use it as your focus topic, but you’ll need to create supporting topics too.

MarketMuse is one such tool whose focus lies in topic modeling. You input a keyword into the software, and receive subtopics as well as questions and buyer personas to target with your content. Unlike most TF-IDF tools, MarketMuse analyzes tens of thousands of webpages.

I got the following results using MarketMuse for the term ‘buy furniture:’

This shows us that our competitors that are ranking well for our target keyword mention things like ‘living room furniture’ and ‘bedroom furniture’ on their sites.

These are terms that are related to our overall topic (buy furniture). Again, they are not synonyms for our keyword. This time, they are subtopics that fall under it.

With this information in mind, we could potentially build out separate pages to target these subtopics. We could have a page dedicated to sofas, to beds and to living room furniture, for example.

If you’ve used TF-IDF, LSI or topic modelling tools before, then you’ll know that it can seem like they’re doing three variations of the same thing. Their outputs are certainly similar: keywords and topics taken from competitors that can be used to cover additional aspects of your content with more depth.

There is also a lot of overlap between the three, and in my research, I found that what a lot of websites described as TF-IDF was, more accurately, actually topic modelling.

Yet, understanding how all of these models work individually, and how they work together, remains crucial to developing content that Google will reward.

Our verdict: should you use TF-IDF or not?

TF-IDF does have its uses in SEO, but the drawbacks that have been mentioned in this article are valid too.

Ultimately, topic modelling and LSI keywords are more useful and will provide bigger impacts to your SEO and content marketing efforts.

They’re particularly great tools for building out new webpages – use them to help create your structure and really flesh out your central topic.

But you can also use them to optimize existing content by researching what topics your content does not cover… yet.

John Mueller was certainly right when he said that we shouldn’t focus all of our SEO efforts on TF-IDF.

Instead, let’s understand it as one of the many aspects that Google uses to evaluate content, and that, while useful in some situations, there are ultimately better tools and models out there for SEO.

If you’d like more information on TF-IDF, or you’d like to develop topic models for your website that will outrank your competitors, get in touch!

[/et_pb_text][/et_pb_column][/et_pb_row][/et_pb_section]

Written by Justin Lester

Justin is a successful entrepreneur who launched and exited two online startups by the age of 24.

He specialises in SEO and Paid Media services and played a critical role in developing cutting-edge marketing applications in the iGaming sector.

He founded Ruby Digital in 2011, which has won numerous awards and is recognised as having the 2nd highest culture score of any SME in South Africa.

Justin is committed to empowering communities with digital marketing skills and is an active member of the entrepreneurship community, sharing his knowledge and expertise as a guest lecturer at the University of Cape Town and serving on the board of the Entrepreneurs Organization Accelerator Program.