Full-Text Search in the Publication Database

The publication database offers two complementary full-text search algorithms:

The Google-style search function is faster; it returns lists of publication entries ordered by their relevance with respect to the search string. Particularly in the case of short search strings, its results may occasionally be unexpected. At least in its default operation mode, those publication records are selected that contain at least one of the words in the search string. A greater number of search items therefore increases, in general, the number of records found. Please refer to the following detailed description for the proper application and optimisation of the Google-style search function.

The strict full-text search function is activated by setting the Checkbox "Strict search" in the field "Search for". It is significantly slower than the Google-style search function, particularly if the selection of publications is not or not significantly restricted. It returns a list of publication records grouped by publication types as a result, which contains all records that hold all specified search items. A greater number of search items therefore reduces, in general, the number of records found. You can find more information on the strict search function in the description of this algorithm later on this page.

The Google-style search function finds only words in the publication records that match exactly one of the words in the search string. (With the optional operator "*", words are also found that begin with one of the words in the search string.) In contrast, the strict full-text search function finds also records that contain words that include one of the words in the search string. With the search string "electron", the Google-style search function finds therefore only records that hold the word "electron" (in upper- or lowercase); the strict text search function would also find "microelectronics". (With "electron*", the Google-style search function would find "electronics" but not "microelectronics". A preceding asterisk ("*"), such as in "*electron" or "*electron*", is ignored by the Google-style search function.)

Both algorithms search the publication records determined by the other selection criteria. The search may comprise either the entire records, or only parts of them. No text search is carried ot if the field "Text that should be found" is empty; in this case, all records matching the remaining selection criteria are output. The search is case-insensitive; upper- and lower-case letters in the search string make no difference.

The Google-style full-text search function

The Google-style full-text search uses a function of the database back-end. The program code of the publication database has therefore hardly an influence on its operation. Since it is much faster than the strict text search function, it has been implemented as the standard full-text search function.

This full-text search function returns a list of publication records ordered by the relevance of the records found. For each record, its relevance and its publication type is given. The relevance of a publication record increases with the number of occurrences of any one of the words in the search string within the searched part of the record. In addition, the publication type and quality may influence its relevance.

The following rules apply for the Google-style full-text search:

  • Words in the search string with less than four characters are ignored.
  • Very common (English) words are ignored.
  • Words that occur in more than half of the records in the database are ignored.
  • Frequently occurring words have a lower relevance (and thus contribute less to the record's relevance) than rarely used ones.
  • In the standard operation mode of the Google-style search function, records are found that contain at least one of the words in a multi-word search strings. Longer search strings return therefore usually a greater number of result records; however, the relevance of records that contain all the words in the search string is significantly greater.

A number of operators can modify the search behaviour. The relevance of the records found changes if at least one of the following operators is used. Search words that occur in more than half of the records are no more ignored. The following operators are available:

  • "+": A leading plus sign indicates that this word must be present in every record returned.
  • "-": A leading minus sign indicates that this word must not be present in any record returned.
  • ">": This operator is used to change a word's contribution to the relevance value that is assigned to a record. The ">" operator increases the contribution.
  • "<": This operator is used to change a word's contribution to the relevance value that is assigned to a record. The "<" operator decreases the contribution.
  • "( )": Parentheses are used to group words into subexpressions. Parenthesized groups can be nested.
  • "~": A leading tilde acts as a negation operator, causing the word's contribution to the record relevance to be negative. It is useful for marking noise words. A record that contains such a word is rated lower than others, but is not excluded altogether, as it would be with the "-" operator.
  • "*": An asterisk is the truncation operator. Unlike the other operators, it should be appended to the word.
  • "" "": A phrase that is enclosed within double quote ("" "") characters matches only records that contain the phrase literally, as it was typed.

The results of a search may differ from those described here if combinations of operators are used. The Google-style full-text search is a feature of the database back-end, and hence out of reach of the developer of the publication database.

The following examples - taken from the documentation of the database back-end - may serve to illustrate the usage of these operators:

  • "apple banana": Find records that contain at least one of the two words.
  • "+apple +juice": Find records that contain both words.
  • "+apple macintosh": Find records that contain the word "apple", but rank records higher if they also contain "macintosh".
  • "+apple -macintosh": Find records that contain the word "apple" but not "macintosh".
  • "+apple +(>turnover <strudel)": Find records that contain the words "apple" and "turnover", or "apple" and "strudel" (in any order), but rank "apple turnover" higher than "apple strudel".
  • "apple*": Find records that contain words such as "apple", "apples", "applesauce", or "applet".
  • ""some words"": Find records that contain the exact phrase "some words" (for example, records that contain "some words of wisdom" but not "some noise words"). Note that the """ characters that surround the phrase are operator characters that delimit the phrase.

The strict full-text search function

This search mode is activated by setting the Checkbox "Strict search" in the field "Search for".

If the search string contains several words the Publication Database by default returns all otherwise matching records that contain all words of the search string, regardless in which order and in which of the fields to be searched these words occur. The following characters are separators between words:

  • Space
  • , (Comma)
  • ; (Semicolon)
  • + (Plus)
All other characters, in particular parentheses around the search string, are interpreted as part of the search text!

The search text "This is an example" results in a search for records that contain the four words "This", "is", "an", and "example" in arbitrary order and in arbitrary locations within the text fields that have been specified with the selection list "Text search in:".

Search items that consist of one of the above separators or contain a separator but should not be split at the separator can be put between double quotes ("). Pairs of quotes are removed before the actual search; they prevent, however, the splitting of the text between them. The search text may contain an arbitrary number of search items in double quotes. Hence, the search text ""This is" "an example"" results in the two search items ""This is"" and ""an example"". To search for one of the above separators, place it between double quotes (e.g., ""+"" to search for a plus sign). It is possible to use search items that contain one double quote; they must be specified after all search items in pairs of double quotes.

You can put the entire search text in double quotes, e.g., ""This is an example"".) Use this feature if you are quite sure that there are records with your search phrase but the default settings return too many search hits.

Please note:

  • The text to be searched for can only be found if it is literally contained in a publication record. ("Literally" includes the extensions described below.) With the strict search algorithm, there is no possibility for a search for similar words!
  • The search is case insensitive. With the search string "electron" you can find records containing "electron", "Electron", "ELECTRON" or this word in any other combination of upper- and lowercase letters.
  • The Database returns also records in which your search text appears only as part of a word. The search text "electron", for example, also renders records containing "electronic" or "nanoelectronic", but not "electromagnetic".
  • The characters "%" and "_" serve as wildcards for an arbitrary number of characters and exactly one character, respectively, in the field "Search for". By entering "ele_tron" or "ele%tron", you will find records with the German ("Elektron") and with the English version ("electron") of this word, plus all records with words that contain either of these strings. "electr%ic" matches "electric" as well as "electronic" or "electromagnetic". (The sequence "\%" allows a search for "%", and the sequence "\_" permits searching for "_". It is not possible to search for "\".)
  • You can use double quotes around a search item to search for records with several contiguous words of the search string. The search function splits the search string at the above separators, therefore also at spaces, into separate search items; if you enter "hot electron" as a search string the search will return all records containing these two words in any of the fields to be searched. Use ""hot electron"" as a search string to search exactly for the term "hot electron".
  • A search in publication records (option "Search in publication records" on the selection page) can only find data contained in an "extended publication record", i.e., in the references themselves, plus abstracts or keywords. There are no full first names of authors in the publication records! Therefore, you should use only the last name (or the last name plus the abbreviated first name) to search for authors. You can find the publications of John Doe with "Doe", "J. Doe" or "Doe J." but not with "John Doe" or "Doe John".
  • Please note that only part of the publication records contain abstracts or keywords. The option "Text search in:" - "Abstracts" is not likely therefore to render comprehensive results!
  • Start your search with rather generic search terms, and limit your search only if necessary. Any additional word in the search text will further limit the scope of your search!