Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > ASP .Net > ASP General > Search methods, "fuzzy" logic, duplicate checking, oh my!

Reply
Thread Tools

Search methods, "fuzzy" logic, duplicate checking, oh my!

 
 
William Morris
Guest
Posts: n/a
 
      02-25-2004
Apologies in advance for the cross-post. Not sure if this is better handled
in ASP code or TransactSQL.

Windows2000 Server
SQL 2000

tblPeople
contactid int
firstname varchar(25)
lastname varchar(25)
address varchar(255)
city varchar(100)
state int
zip varchar(10)
homephone varchar(25)
workphone varchar(25)
mobilephone varchar(25)

Our contact search screen allows the users to search for a contact by phone,
lastname, firstname, city, state, zip. If no matches are found, then the
user is given the option of adding the information into the database as a
new contact. When searching, all whitespace and punctuation is removed for
the comparison.

The database is populated two ways:

1. by hand by our users, where the format is controlled through the
interface - i.e. phone numbers always have an area code, and the numbers are
stored with dashes. Numbers, dashes, spaces, and parentheses are the only
allow characters.
2. Client data as CSV files from a variety of proprietary sources, and
the data could have any ol' crap in it. We scrub as best we can, but some
garbage gets through.

So, what we find is when doing the phone number search, someone with
816-555-1234 won't be found if the user enters 555-1234, and vice versa.
Likewise, if any of the names are mis-spelled, the search will fail.

We'd like to make the query "smart". We've explored SOUNDEX, but the
results that come back often confuse the end-user: "Mercer" comes back with
"Mercer", "Mercier" (not too bad, yet), "Marker" (ok, but...), "Markwardt"
(um...), "Margarucci" (now we're getting blank stares...), and against
numeric values (like phone) it just dies. DIFFERENCE might get us there
somehow, though it returns a 4 on every variation in the preceeding list so
we can't sort that way. I've seen some searches where they're sorted by
"relevance", but I'm not sure how to make that happen.

So I'm looking for ideas on making a "smart search". Thanks just for
reading this far, and thanks for any help you can give.

- Wm


--
William Morris
Product Development, Seritas LLC
Kansas City, Missouri


 
Reply With Quote
 
 
 
 
jenny mabe
Guest
Posts: n/a
 
      02-26-2004
On Wed, 25 Feb 2004 10:27:08 -0600, William Morris
<(E-Mail Removed)> wrote:

> tblPeople
> contactid int
> firstname varchar(25)
> lastname varchar(25)
> address varchar(255)
> city varchar(100)
> state int
> zip varchar(10)
> homephone varchar(25)
> workphone varchar(25)
> mobilephone varchar(25)
>
> So I'm looking for ideas on making a "smart search". Thanks just for
> reading this far, and thanks for any help you can give.


You have clearly goteen into some more advanced areas than i can speak to,
but I wanted to share a couple simple things things I have used in the
past:

-- Use a "LIKE" clause in SQL to search for phone numbers without area
codes and to search for words or names that begin with the entered text
(searching for "Mer" to get "Mercer"). For example:

SELECT firstname, lastname FROM tblPeople WHERE homephone LIKE '%5551212'
-- will give you all the people with phone numbers that are the same but
include an area code

-- You might be able to reduce some of your search complexity depending on
the interface by using a list that dynamically reduces itself as the user
enters their search string (at least in the case of names or cities),
showing those people whose names are alphabetically similar. Some exciting
javascript and ASP trickery is needed to make this work, but by showing a
longer list of possibilities a user can look ahead at the whole list and
often find what they are looking for faster than a search. Also neatly
combined with an autocomplete script to fill in the rest of the selected
name automatically.

-- Encourage or even only offer a zip code search. Zip codes are far more
likely to be entered correctly than cities and states when dealing with
lots of user-created data. To reduce errors in cities and states, you
could even consider regenerating cities and states on import from a
commercially-available zip code database.


HTH,
Jenny
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Search for duplicate elements in the array Ruhe Ruby 2 09-16-2007 12:13 PM
removing search engines from the search bar whatever.or.not@gmail.com Firefox 1 02-17-2005 12:06 PM
Search Bar not displaying search Engines Zimran Douglas Firefox 1 01-07-2005 02:30 PM
search within a search within a search - looking for better way...my script times out Abby Lee ASP General 5 08-02-2004 04:01 PM



Advertisments