Friday, January 27, 2012

Instant’s Auto complete Suggestions Review

Over the past couple of years, auto-complete has become famous over the web. Facebook, YouTube, Google, Bing, MSDN, LinkedIn,Apple and other websites try to complete text phrases as soon as we typing.
Auto-complete definitely makes for a nice user experience, but it can be a challenge to implement efficiently. In many cases, an efficient implementation requires the use of interesting algorithms and data structures. In this blog post, I will describe all about it.
Basic Idea
Basically Auto complete is combination of HTML,JS,CSS and data source. It could be JS string or a file or data fetching from db connection. we usually fetch data from source on the basis of JS events like onkeyup etc. 
Engineering Challenges
however we notice that Facebook or google's auto suggestions are working very fast or with good speed.  I encountered first time with speed issue when i worked for www.peerpower.com. Actually we had planed to use Auto suggester frequently to show master data in drop downs. see registration page http://www.peerpower.com/register.php. That time i implemented incremental caching feature at browser side but later i found that still i don't touch facebook and google standards.
However now some open sourse caching plugins are avalable by default. see http://docs.jquery.com/Plugins/Autocomplete and http://flexbox.codeplex.com.
Typically following are the key points which play game in efficient manner.
  1.  Data structure to retrieve big data
  2.  client side caching from memcache indexed javascript array (for example first letter as index, so friends['a'] = [andrey, albert] )
  3. client side storage techniques (normally all modern browser support it )
  4.  warm up caches as early as it can
  5. build an index of names -> dom elements, do dom manipulation offline and attach the results with only people that match the searched term

Data structure to retrieve big data
Ternary search trees Trie is used to retrive data for auto-complete lookup but needs too much memory. Solution for it PATRICIA (Practical Algorithm to Retrieve Information Coded in Alphanumeric). http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Tree/PATRICIA/
Both are data structues so you can read them about in details separately. 
Other Challenges

Suggestions Can Vary By Region & Language

Not everyone sees the same suggestions. For example if i am in delhi then i will see delhi based results with high priority.

Previously Searched Suggestions

Big demand in suggestor product to track how are things going.

How Suggestions Are Ranked

Big question if you plan to put suggestor at home page.Popularity is a factor, but some less popular searches might be shown above more popular ones.

Deduplicating & Spelling Corrections

Its basic functionality that everyone dreams in auto suggestor :P

Freshness

If there are terms that suddenly spike in popularity in the short term, these can appear as suggestions, even if they haven’t gained long-term popularity.












No comments:

Post a Comment