Friday, August 10, 2007

Thesarus

The word thesaurus is derived from 16th century New Latin, in turn from Latin thesaurus, from ancient Greek θησαυρός thesauros, "store-house", "treasury". Besides its meaning as a treasury or storehouse, it more commonly means a listing of words with similar, related, or opposite meanings (this new meaning of thesaurus dates back to Roget's Thesaurus). For example, a book of jargon for a specialized field; or more technically a list of subject headings and cross-references used in the filing and retrieval of documents (or indeed papers, certificates, letters, cards, records, texts, files, articles, essays and perhaps even manuscripts), film, sound recordings, machine-readable media, etc.

The first example of this genre, Roget's Thesaurus, was published in 1852, having been compiled earlier, in 1805, by Peter Roget. Entries in Roget's Thesaurus are not listed alphabetically but conceptually and are a great resource for writers.

Although including synonyms and antonyms, entries in a thesaurus should not be taken as a list of them. The entries are also designed for drawing distinctions between similar words and assisting in choosing exactly the right word. Nor does a thesaurus entry define words. That work is left to the dictionary.

In information technology, a thesauras represents a database or list of semantically orthogonal topical search keys. In the field of Artificial Intelligence, a thesaurus may sometimes be referred to as an ontology.

Thesaurus databases, created by international standards, are generally arranged hierarchically by themes and topics. Such a thesaurus places each term in context, allowing a user to distinguish between "bureau" the office and "bureau" the furniture. A thesaurus of this type is often used as the basis of an index for online material. The Art and Architecture Thesaurus, for example, is used to index the national databases of museums, Artefacts Canada, held by the Canadian Heritage Information Network (CHIN).

The most significant thesaurus project of recent years is the Historical Thesaurus of English (HTE), ongoing and based at the University of Glasgow. The HTE is a complete database of all the words in the second edition of the Oxford English Dictionary, arranged by semantic field and date. In this way, the HTE arranges the whole vocabulary of English from the earliest written records (in Anglo-Saxon) to the present alongside types and dates of use. It is the first historical thesaurus to be compiled for any of the world's languages and has been in progress since 1964. The HTE project has also produced the Thesaurus of Old English, derived from the whole HTE database (published in 1995, 2000 and now freely-available online here).

Definition
A formal definition of a thesaurus designed for indexing is:
a list of every important term (single-word or multi-word) in a given domain of knowledge; and
a set of related terms for each term in the list.
Terms are the basic semantic units for conveying concepts. They are usually single-word nouns, since nouns are the most concrete part of speech. Verbs can be converted to nouns -- cleans to cleaning, reads to reading, and so on. Adjectives and adverbs, however, seldom convey any meaning useful for indexing. When a term is ambiguous, a “scope note” can be added to ensure consistency, and give direction on how to interpret the term. Naturally, not every term needs a scope note, but their presence is of considerable help in using a thesaurus correctly and reaching a correct understanding of the given field of knowledge.

Term relationships are links between terms that often describe synonyms, near-synonyms, or hierarchical relations. Synonyms and near-synonyms are indicated by a Related Term (RT). The way the term "Cybernetics" is related to the term "Computers" is an example of such a relationship. Hierarchical relationships are used to indicate terms which are narrower and broader in scope. A Broader Term (BT) is a more general term, e.g. “Apparatus” is a generalization of “Computers”. Reciprocally, a Narrower Term (NT) is a more specific term, e.g. “Digital Computer” is a specialization of “Computer”. BT and NT are reciprocals; a broader term necessarily implies at least one other term which is narrower. Thesaurus designers are generally careful to ensure that BT and NT indicate class relationships, as distinguished from part-whole relationships. Some thesauri also include Use (USE) and Used For (UF) indicators when an authorized term is to be used for another, unauthorized, term; for example the entry for the authorized term "Frequency" could have the indicator "UF Pitch". Reciprocally, the entry for the unauthorized term "Pitch" would have the indicator "USE Frequency".

thesarus

Thesarus