

Travel Explora Hub

Unbiasing Duplicate Or Miss-spelt Locations
Travel Gems relies on over 28 Aggregators and API service providers to gather crucial and important information about establishments that focus on travel and Tourism.
To ensure that our guests, clients, as well as the locations, get fair reviews, ratings and price options, we need our system to be as accurate as possible.
But it can be quite challenging, especially when it comes to misspelt names, duplicate names, or even businesses that share the same name.
So, how do we solve this?
Meet, Mr Vladimir Levenshtein, a Soviet mathematician and computer scientist, introduced the concept of Levenshtein Distance. Born in 1935, Levenshtein made significant contributions to the field of information theory and string processing.
Mr Levenshtein formulated a solution to measure the difference between two strings.
Now when we refer to the word “string”, we don’t mean a physical sting, but instead, a string in this case is a location name, and the distance we are referring to is not physical distance, but the “distance” as to how far the similarities are to each location name the aggregators provide us with.
In this instance, if we have a number of Travel locations with similar names, we then apply the Levenschtein process to calculate probabilities to ensure the aggregators and API’s are giving us the correct results.
Example
Let’s compare two hotel names:
- “Grand Hotel Paris”
- “Grand Hotel Berlin”
Steps to Compute the Levenshtein Distance:
- Substitute ‘P’ with ‘B’ to get “Grand Hotel Bairs”.
- Substitute ‘a’ with ‘e’ to get “Grand Hotel Beirs”.
- Substitute ‘i’ with ‘l’ to get “Grand Hotel Beilrs”.
- Substitute ‘r’ with ‘n’ to get “Grand Hotel Berlin”.
It takes 4 operations to transform “Grand Hotel Paris” into “Grand Hotel Berlin”, so the Levenshtein Distance between “Grand Hotel Paris” and “Grand Hotel Berlin” is 4.
Algorithm
The Levenshtein Distance can be computed using a dynamic programming approach. Here’s a high-level explanation of the algorithm:
-
Create a Matrix: Initialize a matrix where the cell
d[i][j]
represents the Levenshtein Distance between the firsti
characters of the first string and the firstj
characters of the second string. -
Initialize Matrix:
- The distance between an empty string and a string of length
j
isj
(i.e.,d[0][j] = j
). - The distance between a string of length
i
and an empty string isi
(i.e.,d[i][0] = i
).
- The distance between an empty string and a string of length
-
Fill the Matrix:
- For each cell
d[i][j]
, calculate the minimum of:- Insertion:
d[i][j-1] + 1
- Deletion:
d[i-1][j] + 1
- Substitution:
d[i-1][j-1] + cost
(wherecost
is 0 if the characters are the same, 1 if they are different).
- Insertion:
- For each cell
-
Retrieve the Distance: The value in
d[m][n]
(wherem
andn
are the lengths of the two strings) represents the Levenshtein Distance.
Convert this into a script:
And that’s it!
That is how we can ensure that our Travel Gems and travel destinations are truly getting the coverage they deserve, and for our clients to be sure they are getting accurate, and the best locations listings.
More in this Category

Businesses Nominated for a Travel Gem

About Our Travel Gem Mystery Guests

Our formulas to calculate ratings

Unbiasing Duplicate Or Miss-spelt Locations

About Travel Gems

Tripadvisor: Are You Maximizing The Potential?

Digital Guides: New To Tripadvisor?

Maximising Your Google Business Profile

The Correct Way To Manage Social Media Reviews

The Ultimate Guide To Building Backlinks For Any Business

Collaborative Blogs, Quizes And Articles

Introducing Our AI-Assisted Review Creator: Making Travel Reviews Effortless

How To Use Our Review Assistant Link and QR Codes

What Is W3C And Why Is It Necessary Of Your Website?

What Is Google PageSpeed Insights And Why Is It Important?
