MagPortal.com   Clustify - document clustering
 Home  |  Newsletter  |  My Articles  |  My Account  |  Help 

Location: Help / User's Guide / How does the search engine work?

The MagPortal.com search engine tries to do a case-insensitive match of your query against the article title, description, authors, and the body of the article. When processing your query, it is sliced into "words" (sets of a adjacent alpha-numeric characters) and all other characters are discarded. Any words which don't match any articles are discarded, and any words which are very common (like "the") are also discarded. Only articles that match all of the remaining words are presented to you. To see which words were actually used (not discarded) in the search, look at the "Actual search terms used" output below the search box on the results page.

By default, the output is ordered by the "quality" of the match. The quality is determined by using a mathematical formula (standard term-frequency inverse-document frequency algorithm, not related to Hot Neuron SimilarityTM) which takes into account how often the search term appears in the document (relative to the total length of the document) and how rare that particular search term is among all documents. For more details on search engine algorithms see the page Information Retrieval and Search. For even more detail try the books: Modern Information Retrieval and Managing Gigabytes.

Our search engine does not do "stemming," meaning that it does not recognize that "recipe" and "recipes" are essentially the same.

Tip: If one of the words in your query is more important to you than the others, you can repeat that word several times in the query to force articles containing that word to have a higher quality ranking.

The search engine output screen contains various options to allow you more flexibility in refining your search.

If you came to the search engine through one of the category pages, the "search" pop-up menu on the output page gives you the option to search only the articles from that category or you can search on "all articles." If the category has subcategories, the category name will appear twice in the "search" menu--once with "(& subcat)" to indicate that subcategories should be included in the search, and once with "(only)" to indicate that subcategories are not to be included.

You can use the "order by" pop-up menu to sort by:

quality of match
How well the document matches your query as described above. This is the default.
date
The date when MagPortal.com found the article. Normally, this is a good indication (within a business day) of when the publisher put the article on the web (not necessarily when it became available in print). When we index back issues of a publication, we manually set the sorting dates to sensible values for those articles. Note that the sorting date is not the same as the publication date (the date shown in the article listing). We don't sort by publication date because some publishers don't supply one, or they give one that is rather imprecise like "Fall 1999."
publication
This produces a list of the publications that have articles matching the query. The number of matching articles is shown for each. Clicking on the magazine name causes a search to occur where the results are restricted to that magazine. The resulting output page will have the "Search" pop-up menu set to the magazine name instead of "all articles." You can change it back to "all articles" to search articles from all publications.
category
This produces a list of categories that have articles matching the query. The number of matching articles is shown for each. Clicking on the category name causes a search to occur where the results are restricted to that category. The resulting output page will have the "Search" pop-up menu set to the category name instead of "all articles." You can change it back to "all articles" to search articles from all categories.