Content based Image Retrieval (CBIR) using MATLAB
Table of Contents:
1.1 Aim of the Project
The aim of this project is to review the current state of the art in content-based image retrieval (CBIR), a technique for retrieving images on the basis of automatically-derived features such as color, texture and shape. Our findings are based both on a review of the relevant literature and on discussions with researchers in the field.
The need to find a desired image from a collection is shared by many professional groups, including journalists, design engineers and art historians. While the requirements of image users can vary considerably, it can be useful to characterize image queries into three levels of abstraction: primitive features such as color or shape, logical features such as the identity of objects shown and abstract attributes such as the significance of the scenes depicted. While CBIR systems currently operate effectively only at the lowest of these levels, most users demand higher levels of retrieval.
1.2 General Introduction
1.2.1 Content Based Image Retrieval
Content-based image retrieval (CBIR), also known as query by image content (QBIC) and content-based visual information retrieval (CBVIR) is the application of computer vision to the image retrieval problem, that is, the problem of searching for digital images in large databases.
"Content-based" means that the search will analyze the actual contents of the image. The term 'content' in this context might refer colors, shapes, textures, or any other information that can be derived form the image itself. Without the ability to examine image content, searches must rely on metadata such as captions or keywords. Such metadata must be generated by a human and stored alongside each image in the database.
Problems with traditional methods of image indexing [Enser,1995] have led to the rise of interest in techniques for retrieving images on the basis of automatically-derived features such as color, texture and shape – a technology now generally referred to as Content-Based Image Retrieval (CBIR). However, the technology still lacks maturity, and is not yet being used on a significant scale. In the absence of hard evidence on the effectiveness of CBIR techniques in practice, opinion is still sharply divided about their usefulness in handling real-life queries in large and diverse image collections. The concepts which are presently used for CBIR system are all under research.
Let us start with the word “image”. The surrounding world is composed of images. Humans are using their eyes, containing 1.5x10^8 sensors, to obtaining images from the surrounding world in the visible portion of the electromagnetic spectrum (wavelengths between 400 and 700 nanometers). The light changes on the retina are sent to image processor center in the cortex.
In the image database systems geographical maps, pictures, medical images, pictures in medical atlases, pictures obtaining by cameras, microscopes, telescopes, video cameras, paintings, drawings and architectures plans, drawings of industrial parts, space images are considered as images.
There are different models for color image representation. In the seventeen century Sir Isaac Newton showed that a beam of sunlight passing through a glass prism comes into view as a rainbow of colors. Therefore, he first understood that white light is composed of many colors. Typically, the computer screen can display 2^8 or 256 different shades of gray. For color images this makes 2^(3x8) = 16,777,216 different colors.
Clerk Maxwell showed in the late nineteen century that every color image cough be created using three images – Red, green and Blue image. A mix of these three images can produce every color. This model, named RGB model, is primarily used in image representation. The RGB image could be presented as a triple(R, G, B) where usually R, G, and B take values in the range [0, 255]. Another color model is YIQ model (lamination (Y) , phase (I), quadrature phase (Q)). It is the base for the color television standard. Images are presented in computers as a matrix of pixels. They have finite area. If we decrease the pixel dimension the pixel brightness will become close to the real brightness. The same image with different pixel dimension is shown below.
1.2.3 Image Database systems
Set of images are collected, analyzed and stored in multimedia information systems, office systems, Geographical information systems(GIS), robotics systems , CAD/CAM systems, earth resources systems, medical databases, virtual reality systems,
information retrieval systems, art gallery and museum catalogues, animal and plant atlases, sky star maps, meteorological maps, catalogues in shops and many other places.
There are sets of international organizations dealing with different aspects of image storage, analysis and retrieval. Some of them are: AIA (Automated Imaging/Machine vision), AIIM (Document imaging), ASPRES (Remote Sensing/Protogram) etc.
There are also many international centers storing images such as : Advanced imaging, Scientific/Industrial Imaging, Microscopy imaging, Industrial Imaging etc. There are also different international work groups working in the field of image compression, TV images, office documents, medical images, industrial images, multimedia images, graphical images, etc.
1.2.4 Logical Image Representation in Database Systems:
The logical image representation in image databases systems is based on different image data models. An image object is either an entire image or some other meaningful portion (consisting of a union of one or more disjoint regions) of an image. The logical image description includes: meta, semantic, color, texture, shape, and spatial attributes.
Color attributes could be represented as a histogram of intensity of the pixel colors. A histogram refinement technique is also used by partitioning histogram bins based on the spatial coherence of pixels. Statistical methods are also proposed to index an image by color correlograms, which is actually a table containing color pairs, where the k-th entry for <i,j> specifies the probability of locating a pixel of color j at a distance k from a pixel of color I in the image.
1.2.5 Classification and indexing schemes
Many picture libraries use keywords as their main form of retrieval – often using indexing schemes developed in-house, which reflect the special nature of their collections. A good example of this is the system developed by Getty Images to index their collection of contemporary stock photographs. Their thesaurus comprises just over 10 000 keywords, divided into nine semantic groups, including geography, people, activities and concepts. Index terms are assigned to the whole image, the main objects depicted, and their setting. Retrieval software has been developed to allow users to submit and refine queries at a range of levels, from the broad (e.g. “freedom”) to the specific (e.g. “a child pushing a swing”).
Probably the best-known indexing scheme in the public domain is the Art and Architecture Thesaurus (AAT), originating at Rensselaer Polytechnic Institute in the early 1980s, and now used in art libraries across the world. AAT is maintained by the Getty Information Institute and consists of nearly 120,000 terms for describing objects, textural materials, images, architecture and other cultural heritage material. There are seven facets or categories which are further subdivided into 33 sub facets or hierarchies. The facets, which progress from the abstract to the concrete, are: associated concepts, physical attributes, styles and periods, agents, activities, materials, and objects. AAT is available on the Web from the Getty Information Institute at http://www.ahip.getty.edu/aat_browser/. Other tools from Getty include the Union List of Artist Names (ULAN) and the Getty Thesaurus of Geographic Names (TGN). Another popular source for providing subject access to visual material is the Library of Congress Thesaurus for Graphic Materials (LCTGM). Derived from the Library of Congress Subject Headings (LCSH), LCTGM is designed to assist with the indexing of historical image collections in the automated environment. Greenberg  provides a useful comparison between AAT and LCTGM.
A number of indexing schemes use classification codes rather than keywords or subject descriptors to describe image content, as these can give a greater degree of language independence and show concept hierarchies more clearly. Examples of this genre include ICONCLASS from the University of Leiden [Gordon, 1990], and TELCLASS from the BBC [Evans, 1987]. Like AAT, ICONCLASS was designed for the classification of works of art, and to some extent duplicates its function; an example of its use is described by Franklin . TELCLASS was designed with TV and video programmes in mind, and is hence rather more general in its outlook. The Social History and Industrial Classification, maintained by the Museum Documentation Association, is a subject classification for museum cataloguing. It is designed to make links between a wide variety of material including objects, photographs, archival material, tape recordings and information files.
A number of less widely-known schemes have been devised to classify images and drawings for specialist purposes. Examples include the Vienna classification for trademark images [World Intellectual Property Organization, 1998], used by registries Worldwide to identify potentially conflicting trademark applications, and the Opitz coding system for machined parts [Opitz et al, 1969], used to identify families of similar parts which can be manufactured together.
A survey of art librarians conducted for this report suggests that, despite the existence of specialist classification schemes for images, general classification schemes, such as Dewey Decimal Classification (DDC), Library of Congress (LC), BLISS and the Universal Decimal Classification (UDC), are still widely used in photographic, slide and video libraries. The former scheme is the most popular, which is not surprising when one considers the dominance of DDC in UK public and academic library sectors. ICONCLASS, AAT, LCTGM, SHIC are all in use in at least one or more of the institutions in the survey. However, many libraries and archives use in-house schemes for the description of the subject content. For example, nearly a third of all respondents have their own in-house scheme for indexing slides.
When discussing the indexing of images and videos, one needs to distinguish between systems which are geared to the formal description of the image and those concerned with subject indexing and retrieval. The former is comparable to the bibliographical description of a book. However, there is still no one standard in use for image description, although much effort is being expended in this area by a range of organizations such as the Museum Documentation Association, the Getty Information Institute, the Visual Resources Association the International Federation of Library Association/Art Libraries and the International Committee for Documentation (CIDOC) of the International Council of Museums (ICOM).
The descriptive cataloguing of photographs presents a number of special challenges. Photographs, for example, are not self-identifying. Unlike textual works that provide such essential cataloguing aids as title pages, abstracts and table of contents, photographs often contain no indication of author or photographer, names of persons or places depicted dates, or any textual information whatever. Cataloguing of images is more complex than that for text documents, since records should contain information about the standards used for image capture and how the data is stored as well as descriptive information, such as title, photographer (or painter, artist, etc). In addition, copies of certain types of images may involve many layers of intellectual property rights, pertaining to the original work, its copy (e.g. a photograph), a digital image scanned from the photograph, and any subsequent digital image derived from that image.
Published reviews of traditional indexing practices for images and video include many writers discuss the difficulties of indexing images. The problems of managing a large image collection. He notes that, unlike books, images make no attempt to tell us what they are about and that often they may be used for purposes not anticipated by their originators. Images are rich in information and can be used by researchers from a broad range of disciplines. As Baser comments:
“A set of photographs of a busy street scene a century ago might be useful to historians wanting a ‘snapshot’ of the times, to architects looking at buildings, to urban planners looking at traffic patterns or building shadows, to cultural historians looking at changes in fashion, to medical researchers looking at female smoking habits, to sociologists looking at class distinctions, or to students looking at the use of certain photographic processes or techniques.”
Svenonius  discusses the question of whether it is possible to use words to express the “abruptness of a work in a wordless medium, like art. To get around the problem of the needs of different users groups, van der Starre  advocates that indexers should “stick to ‘plain and simple’ indexing, using index terms accepted by the users, and using preferably a thesaurus with many lead-ins,” thus placing the burden of further selection on the user. Shatford Layne (1994) suggests that, when indexing images, it may be necessary to determine which attributes provide useful groupings of images; which attributes provide information that is useful once the images are found; and which attributes may, or even should, be left to the searcher or researcher to identify. She also advocates further research into the ways images are sought and the reasons that they are useful in order to improve the indexing process. Constantopulos and Doerr (1995) also support a user centred approach to the designing of effective image retrieval systems. They urge that attention needs to be paid to the intentions and goals of the users, since this will help define the desirable descriptive structures and retrieval mechanisms as well as understanding what is ‘out of the scope’ of an indexing system.
When it comes to describing the content of images, respondents in our own survey seem to include a wide range of descriptors including title, period, genre, subject headings, keywords, classification and captions (although there was some variation by format). Virtually all maintain some description of the subject content of their images. The majority of our respondents maintain manual collections of images, so it is not surprising that they also maintain manual indexes. Some 11% of respondents included their photographs and slides in the online catalogues, whilst more than half added their videos to their online catalogues. Standard text retrieval or database management systems were in use in a number of libraries (with textual descriptions only for their images). Three respondents used specific image management systems: Index+, iBase and a bespoke in-house system. Unsurprisingly, none currently use CBIR software.
1.2.6 Research Trends in the Image Database Systems
Most image database systems are products of research, and therefore emphasize only one aspect of content-based retrieval. Sometimes this is the sketching capability in the user interface; sometimes it is a new indexing data structure, etc. Some systems are created as a research version and a commercial product. The commercial version is usually less advanced, and shows more standard searching capabilities. A number of systems provide user interface that allows more powerful query formulation than is useful in demo system. Most systems use color and texture features, few systems use shape features, and yet less use spatial features. The retrieval on color usually yield images that have similar colors. The larger the collection of images, the greater is the chance that it contains an image similar to the query image.