Design, query, and evaluate information retrieval systems.
We live in an age when information is ubiquitous. We find and use structured information in both our professional and personal lives every day, from seeing when a movie is playing, visiting a favorite website, to tracking time and payroll information at work. All of this data is stored in a database or information system, and these information systems all require people to design the structure of the data and manage its contents. For information professionals, this applied through the design and administration of systems that house public, academic, and professional reference materials, both digital and analog.
Design is at the core of an information retrieval system. Herb Simon, an influential thought leader in professional design believed that “what professionals do in their jobs… is to take an existing state of affairs, a problem, and transfer it into a preferred state, a solution” (Weedman, 2018, p.171). Information professionals design many types of programs and initiatives to address problems or needs, and just as with any design, information retrieval systems design begins with creating groups and subgroups, as well as making decisions about how to address the items that don’t neatly fit into either so that users can discover the aggregated information they need (Weedman, 2018). Information-bearing entities, or documents, the terms for this aggregated information, can take many forms including: “books, scholarly journals, digital images, dolls, XML files, collections of presidential papers, MP3 files, picture books for children, and corporate reports, among many, many other things” (Weedman, 2018, p.173). The design of these systems requires a depth of knowledge of the system users, their needs, and their abilities, and it also requires a depth of knowledge about the domain of assets and the insight and flexibility to embrace the compromises that need to be made to guide those users to the information (Weedman, 2018). The solution design process involves defining the problem to be solved, inquiry, and interacting with the assets available to see how they respond (Weedman, 2018). Rarely is there a single solution to any problem, and it is the job of the information professional to determine which solution for both storing and retrieving information best solves the problem at hand.
An information retrieval system builds an environment of structured data to describe and discover unstructured documents. The fields that describe the data are attributes or categories, and generally use a controlled vocabulary where the information professional will identify preferred terms that isolate the different facets of the attribute. These facets are assigned to the unstructured elements as metadata, providing context and aiding delivery. Metadata is an integral component in the storage of any domain of information. It allows for the documents to be efficiently represented in the storage system and enables the discovery of those items. The National Information Standards Institute (NISO) defines metadata as, “structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource” (Weedman, 2018, p.175). This mission to retrieve, use, or manage a resource can look very different in different domains of information, and metadata could contain author or subject information, or it may contain copyright information or data on previous usage information of an asset.
In the design or selection of the retrieval system, it is important to consider the features that aid discoverability such as which elements the search examines to deliver returns, the richness of the organizational structure compared to the taxonomy of the assets, and whether it will return partial matches in its returns (Weedman, 2018). Generally, instead of searching the contents of the items in the collection, search engines instead look at the indexed information from those items which can cause a lag while items are being indexed, meaning that new additions to the information storage are not immediately discoverable by the retrieval system. The search method and algorithms at play in the retrieval system can impact the content strategy of the items themselves to push desired items earlier in the search results (Weedman, 2018). Some services automate this relationship between storage and retrieval in out of the box solutions. These include integrated library systems, electronic resource management systems, information service platforms, technology managers and library services platforms, online catalogs, discovery interfaces, index-based discovery services, and library portals (Breeding, 2015).
It is important to understand the different types of systems that can be queried. Databases have a finite amount of data, and, for the administrator, have a level of granularity in selecting which assets will be indexed and discoverable in its system (Weedman, 2018). In a database, it is essential to know what the default search is. That is to say, when a user searches in the primary search field, identifying which content or metadata areas are being searched. Also, databases don’t allow for variations of terms or names unless those are programmed in as additional values in the controlled vocabulary, which can impact discoverability if the wrong variation of a name is used in search. “The belief is that if ambiguous words can be correctly disambiguated, IR performance will increase” (Sanderson, 1994, p.142).
Internet searches have spoiled many users to think that a simple search will return all desired results, but searches in different systems can be very different and require a varied approach. The better the storage system design is understood, the better retrieval efforts will be (Weedman, 2018). For instance, if metadata or tagging is utilized in the system, it’s important to know if a controlled vocabulary is used, and if yes, what those facets are and whether ambiguous terms are allowed.
Perhaps the most important part of a search strategy is simply understanding the structure of the data in the database, but other considerations can be applied as well. Boolean searches, for example, provide flexibility to expand and contract the search scope through simple commands such as AND, OR, and NOT. This simple method of search discrimination is becoming more popular in database and information retrieval systems. Other approaches use a subject hierarchy that allows users to hone in their desired topic, making the users’ familiarity with the underlying structure even more important.
When evaluating an information retrieval system, it’s important to look at the relevance it provides. It should avoid the information the user isn’t interested in and retrieve the information that the user is interested in. Bill Maron, a pioneer of search engine design, defined the goal of an information retrieval system to deliver “all and only the relevant information (Weedman, 2018, p.182)” that the user wants. What is relevant can be incredibly hard to predict from user to user, but it is an important consideration in information system evaluations because it informs the common measurements of evaluation: recall, or how accurate the system is in delivering the full body of relevant assets, and precision, or how close the system gets to delivering only the relevant assets (Weedman, 2018). Information retrieval system evaluation involves considering the representation of the assets in the system through metadata, to make sure it is robust enough to be meaningful but sparse enough to be usable. Also, it is paramount to make sure that the metadata terms are objective and discrete to avoid confusion when either tagging those items in their different attributes or searching in them. Also important when evaluating an information retrieval system is assessing the usability of that system. Usability can include different aspects such as the site layout, whether the content and external links are current, and the amount of information found at each destination (Weedman, 2018). The design of an information retrieval system needs to not only address the structure of the data and metadata in the system but also the needs and preferences of the user to ensure both discoverability and findability.
I had the amazing opportunity to be in a professional setting where I was applying these concepts of system design and evaluation while I was taking the courses at San Jose State School of Information. In my engagement at UCSC as the digital asset manager, I conducted a needs analysis of the video production team and ultimately selected the digital asset management system that the organization went forward with. In the evaluation and implementation process, the balance of recall and precision was a constant struggle as I built a metadata schema to support the body of assets, while strategically manipulating the search options to limit the search to the attributes that were most meaningful to my users. One ongoing challenge was in the user adoption and satisfaction of the UI, both in the web application and in the embedded widget in their editing platform. The interface was very technical which enabled experienced users to quickly interact with the information held there, but novice users experienced reduced usability on the site, requiring additional training.
I selected this assignment to support this competency because it demonstrates the considerations in designing, querying, and evaluating an information retrieval system. In this group assignment, my peers and I established a controlled vocabulary of terms, drafted a statement of purpose for the IR system, and determined the data structure and attributes that would be gathered for each item. After creating content records for the system, we conducted a number of test searches to identify the recall and precision of the system, using both simple and advanced Boolean searches.
I submit this assignment as an example of my understanding of this competency because in this work I thoroughly examine and evaluate the information retrieval system in use at the Smithsonian Repository. I examine the domain of information held in IR system as well as the technical nuances of their archive database, dSpace. Additionally, in this assignment, I explore metadata structure and collected attributes that support the collections. I also examine the users of their system to identify their needs and preferences.
I selected this assignment in support of this competency because of the thorough evaluation my group did on the website, Nomadic Matt. After the executive summary and introduction, we completed a thorough sitemap of the website Nomadic Matt, identified areas of friction for the user, and created a proposed new sitemap that addressed grouping of like items and page-level design choices that could make the information on the site more easily discoverable. The final section of the assignment reflects the synergy of evaluating an IR system and developing recommendations to improve that system.
I selected this assignment as evidence of this competency because in it, I examine the steps taken by the FBI in the massive Trilogy / Virtual Casefile project at the turn of the century, and examined how their drive to create a sophisticated information retrieval system ultimately ended in failure. This understanding of how a project can fail despite a mandate, ample budget, and leadership support is an important element in managing a successful project.
The balance between the relationship of managing data so that it will be discoverable, and managing the system to discover that information is fundamental to the information professional. I have learned that an information retrieval system allows the application of structured data search methods to discover unstructured data, and that foundation can be built upon by other MLIS competencies. Whether identifying metadata values to describe information or vetting platforms to deliver information to users, the lessons learned on the principles of design, querying, and evaluating the underlying database are applicable and impactful.
Breeding, M. (2015). Managing Technology. In S. Hirsh (Ed.), Information Services today: An introduction (pp. 130–138). Rowman & Littlefield.
Sanderson M. (1994) Word Sense Disambiguation and Information Retrieval. In: Croft B.W., van Rijsbergen C.J. (eds) SIGIR ’94. Springer, London
Weedman, J. (2018). Information Retrieval: Designing, Querying, and Evaluating Information Systems. In K. Haycock & M.-J. Romaniuk (Eds.), The portable MLIS: insights from the experts (Second edition, pp. 171–185). Libraries Unlimited, an imprint of ABC-CLIO, LLC.