Putting it simply, website categorization is the act of placing websites into certain categories. For instance, sites like Twitter and Facebook will be categorized as social media while sites like CNN will be categorized as news.
But as simple as this seems, there’s a little more to website categorization than meets the eye.
Sometimes website categorization is misinterpreted as website rankings (like Alexa) or SEO rankings (like Google). Website categorization is only concerned with defining the type of content on a website. It does not take into account how often a certain site if visited or how likely it is to show up in a search result.
Additionally, the categories defined on blog sites like WordPress do not impact website categorization. This is sometimes called website taxonomy. The taxonomies used within websites are usually custom depending on the overall theme of the blog; domain categorization software does not assess the content on individual pages, but instead it categorizes the site as a whole.
Reddit and Facebook are great examples of this. These sites are considered message boards and social media, respectively. However, there are Reddit and Facebook communities that are both dedicated to things like fitness. A domain categorization tool would never examine these websites and say their purpose is fitness, even if they have multiple URLs dedicated to running, weightlifting, etc. If a website categorization tool took the category of each URL of a website into account, it wouldn’t be useful. It would turn up with nearly all possible website categories, and it would make websites like Reddit and Facebook seem as though they are the same when they are in fact very different types of websites that serve completely different purposes.
Anyone can start a website. But when you register a domain, there’s no selection box for what type of website you’re creating. There is no official internet taxonomy for classifying websites.
When we discuss domain categorization, the taxonomies we’re discussing are most often created by vendors or standards created within an industry.
The most common taxonomy used online, and one you may have run into, was developed by the IAB (alternatively the Interactive Advertising Bureau and Internet Advertising Bureau depending on if you are in the UK or not). The IAB format is a standard used in digital advertising that makes it easy for advertisers to choose the placement of their ads. This allows them to promote their technology solution on a technology website or their healthcare solution on a healthcare website, avoiding websites that are irrelevant to their business.
The IAB is comprised of parent and child categories, giving users roughly 400 categories to choose from.
Website categorization vendors also create their own unique taxonomies to fit their business needs. Website categories vary widely depending on your provider, though more concise and simplified taxonomies are preferred to avoid unnecessarily complex categorization. Depending on the vendor, there may or may not be parent or child categories, and malicious content may be within the taxonomy or outside of it.
No matter what, website categorization occurs when a website categorization tool ingests a website. What differs from vendor to vendor is how these domains are ingested.
At Webshrinker, we ingest domains when our customers use our tool. Many times, a website that a customer needs to be categorized is encountered by our software for the first time. Our AI then scans the website for clues based on content and images as to what type of content category that domain falls into. Further, we also ingest domain feeds from external sources and run those sites through our AI categorization so we are always building on our database.
These are a broad assortment of use cases for website categorization, but here are a few common ones:
The first (and most obvious) benefit of website categorization is understanding exactly what the purpose of a website is. Like I mentioned earlier, anyone can create a website, and even if the creator of that website claims it has a certain purpose, the truth might be totally different. It wouldn’t be valuable to rely on self-reported categorization even if that data was available, as it could be wildly inaccurate or misleading. The ability to definitively say “this is what this website is about” is incredibly valuable for many businesses, whether they’re looking to integrate with a domain categorization tool or rely on it separately.
Another benefit of domain categorization is security. Whether you’re looking to implement content filtering or network security, the ability to block sites commonly associated with malicious content or all malicious content will protect both employees and end users.
Website classification also gives brands protection over where their company ads appear. It’s not just about appearing in relevant places, but about avoiding unsavory domains that may damage brand images.What are you looking to achieve by using website categorization? Sign up for a demo and let us know.