Overview
Any product retail related site , portal requires robust search engine to pull you the information you want. It has to be pretty fast to hook you with right information and help you through. If not then you move to next site which satisfy your shopping needs. Well this search engine can satisfy various business requirement not just retail or product industry . You can make use in any domain where data search is essential.
Fundamentals-
Search works on parsing of data and extract information and present it to end users. What next , parsing must be fast this comes with indexing of data. Now if your data is in database server this may adds up overhead with lot of response time. So we processed this data and get it indexed through scheduled job during non business hours. This may sit on some search engine server for better performance,
How to help end users , what goes IN and out?
Dictionary- This is place holder where all important words are organised and sorted. This is a lookup or a catalog before it actually fetched the detailed results. This can act like address.
Once user gives search words, phrases or anything, the search system has to be smart to pick the right reference from dictionary and send the detailed information to end users.
Now do this following entity comes into play.
Linguistic Guide- To identify language of the document or data. use language culture.
Tokenization- Split and normalize the phrase or joined words.Splitting of stream of text. Stemming or segmentation.
Lemmatization- Identify grammers of data . Say Car or Cars. Plural. Singular, positive, comparative, superlative for adjectives. tense or verbs.Compute may also give computer or computing.
Spell Check- There are several properties such threshold,maxlength, min length, exact or tolerance level. It corrects the wrong spell words and suggest Did you mean ? words to display results.Its important dictionary holds valid spelled words to lookup.
Anti Phrasing and stop words: It will filter most come phrase such as I am, who is ?, where are? etc.
Say for example. You give a search Who is Mr. Miller It will search for only Mr. Miller from source.
Synonyms: It checks for words with similar meanings. For e.g My Car search will extract Automobile information aswell.
Entity Extraction: Detection of Entity such as names, company, location, country,street names ,file name, telephone,etc,
Noun Phrase Extraction: It identifies the noun words from the dictionary,
Structural Analyis: It classify the content of the page such as buy, order or shopping
Phonetic Search: These are search words mispelled due to pronounciation. Such as Schwarzenegger and user may search for shwarsenegger.
Offensive Content Filter: Content having offensive meanings are filtered.
The search engine does have syntax and semantics around using above construct thus enabling smart search engine.
Any product retail related site , portal requires robust search engine to pull you the information you want. It has to be pretty fast to hook you with right information and help you through. If not then you move to next site which satisfy your shopping needs. Well this search engine can satisfy various business requirement not just retail or product industry . You can make use in any domain where data search is essential.
Fundamentals-
Search works on parsing of data and extract information and present it to end users. What next , parsing must be fast this comes with indexing of data. Now if your data is in database server this may adds up overhead with lot of response time. So we processed this data and get it indexed through scheduled job during non business hours. This may sit on some search engine server for better performance,
How to help end users , what goes IN and out?
Dictionary- This is place holder where all important words are organised and sorted. This is a lookup or a catalog before it actually fetched the detailed results. This can act like address.
Once user gives search words, phrases or anything, the search system has to be smart to pick the right reference from dictionary and send the detailed information to end users.
Now do this following entity comes into play.
Linguistic Guide- To identify language of the document or data. use language culture.
Tokenization- Split and normalize the phrase or joined words.Splitting of stream of text. Stemming or segmentation.
Lemmatization- Identify grammers of data . Say Car or Cars. Plural. Singular, positive, comparative, superlative for adjectives. tense or verbs.Compute may also give computer or computing.
Spell Check- There are several properties such threshold,maxlength, min length, exact or tolerance level. It corrects the wrong spell words and suggest Did you mean ? words to display results.Its important dictionary holds valid spelled words to lookup.
Anti Phrasing and stop words: It will filter most come phrase such as I am, who is ?, where are? etc.
Say for example. You give a search Who is Mr. Miller It will search for only Mr. Miller from source.
Synonyms: It checks for words with similar meanings. For e.g My Car search will extract Automobile information aswell.
Entity Extraction: Detection of Entity such as names, company, location, country,street names ,file name, telephone,etc,
Noun Phrase Extraction: It identifies the noun words from the dictionary,
Structural Analyis: It classify the content of the page such as buy, order or shopping
Phonetic Search: These are search words mispelled due to pronounciation. Such as Schwarzenegger and user may search for shwarsenegger.
Offensive Content Filter: Content having offensive meanings are filtered.
The search engine does have syntax and semantics around using above construct thus enabling smart search engine.
No comments :
Post a Comment