Search methods, "fuzzy" logic, duplicate checking, oh my!
Apologies in advance for the cross-post. Not sure if this is better handled
in ASP code or TransactSQL.
Our contact search screen allows the users to search for a contact by phone,
lastname, firstname, city, state, zip. If no matches are found, then the
user is given the option of adding the information into the database as a
new contact. When searching, all whitespace and punctuation is removed for
The database is populated two ways:
1. by hand by our users, where the format is controlled through the
interface - i.e. phone numbers always have an area code, and the numbers are
stored with dashes. Numbers, dashes, spaces, and parentheses are the only
2. Client data as CSV files from a variety of proprietary sources, and
the data could have any ol' crap in it. We scrub as best we can, but some
garbage gets through.
So, what we find is when doing the phone number search, someone with
816-555-1234 won't be found if the user enters 555-1234, and vice versa.
Likewise, if any of the names are mis-spelled, the search will fail.
We'd like to make the query "smart". We've explored SOUNDEX, but the
results that come back often confuse the end-user: "Mercer" comes back with
"Mercer", "Mercier" (not too bad, yet), "Marker" (ok, but...), "Markwardt"
(um...), "Margarucci" (now we're getting blank stares...), and against
numeric values (like phone) it just dies. DIFFERENCE might get us there
somehow, though it returns a 4 on every variation in the preceeding list so
we can't sort that way. I've seen some searches where they're sorted by
"relevance", but I'm not sure how to make that happen.
So I'm looking for ideas on making a "smart search". Thanks just for
reading this far, and thanks for any help you can give.
Product Development, Seritas LLC
Kansas City, Missouri
Re: Search methods, "fuzzy" logic, duplicate checking, oh my!
On Wed, 25 Feb 2004 10:27:08 -0600, William Morris
> contactid int
> firstname varchar(25)
> lastname varchar(25)
> address varchar(255)
> city varchar(100)
> state int
> zip varchar(10)
> homephone varchar(25)
> workphone varchar(25)
> mobilephone varchar(25)
> So I'm looking for ideas on making a "smart search". Thanks just for
> reading this far, and thanks for any help you can give.
You have clearly goteen into some more advanced areas than i can speak to,
but I wanted to share a couple simple things things I have used in the
-- Use a "LIKE" clause in SQL to search for phone numbers without area
codes and to search for words or names that begin with the entered text
(searching for "Mer" to get "Mercer"). For example:
SELECT firstname, lastname FROM tblPeople WHERE homephone LIKE '%5551212'
-- will give you all the people with phone numbers that are the same but
include an area code
-- You might be able to reduce some of your search complexity depending on
the interface by using a list that dynamically reduces itself as the user
enters their search string (at least in the case of names or cities),
showing those people whose names are alphabetically similar. Some exciting
longer list of possibilities a user can look ahead at the whole list and
often find what they are looking for faster than a search. Also neatly
combined with an autocomplete script to fill in the rest of the selected
-- Encourage or even only offer a zip code search. Zip codes are far more
likely to be entered correctly than cities and states when dealing with
lots of user-created data. To reduce errors in cities and states, you
could even consider regenerating cities and states on import from a
commercially-available zip code database.
|All times are GMT. The time now is 11:21 PM.|
Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.