|
By Jayne Dutra
Jet Propulsion Laboratory, California Institute of Technology
In the Land of Web 1.0, we would search by looking for a small box in
the top corner of a website. The user would be expected to know a
magical keyword or some other bit of information that would unlock the
door to a cascade of results ready to be winnowed by hand into piles
of carefully hoarded treasure. Publishing to the Web was controlled by
a few individuals called 'webmasters' and data was carefully guarded
behind moats and firewalls in castles called database stores. Search
engines were composed of spiders that crawled the Web to find pages
rendered in HTML, which made them understandable only to advanced
human intellect and not re-use friendly. Search had to 'stink', which
always seemed a bit unsanitary.
Today in a Web 2.0 world
Today things are different. Ordinary people publish blogs and have
passionate electronic conversations in wikis. Data is out and about,
turning up on iPhones, navigational devices in your auto and podcasts.
Bits of content recombine and transform themselves into altered beings
with new formats and sexy, fashionable looks. The Web is a movable
feast with Twitter <http://www.twitter.com> parties materialising
spontaneously as individuals find each other in both virtual and
physical space. New connections from rich social interactions on
YouTube <http://www.youtube.com> and Facebook.
<http://www.facebook.com> create vibrant energy that renews human
discourse. Wisdom is collected, syndicated and documented in Wikipedia
and Wikimedia. Rich media, photos, screencasts and other
visualisations are tagged for sharing on Flickr
<http://www.flickr.com> and del.icio.us <http://www.del.icio.us>.
Where is it all going and how do we, as the Web's virtual
cartographers, help others find their way at a time when fellow
travellers are empowered beyond our wildest expectations of only a few
years ago? How can we add value to information retrieval systems
within our organisations that enhances user experience and meets the
increased pace of daily activity and multi-tasking? Web 2.0 has
raised the bar for those of us involved in enterprise search.
Search is complex
More than ever, search developers are required to understand the core
foundation of the organisation's business models. Individual aspects
of services, products and processes are needed in formats that can be
recombined to report past performance, current status and future 'what
if' scenarios. 'Business Intelligence' was once confined to
statistics on last quarter's sales, but now corporations want to
understand why the business performed as it did, what was successful,
what didn't work and how they can develop strategies to capture and
retain market share.
Enterprise search is no longer a one-size-fits-all problem.
Information retrieval is a complex area that is being increasingly
seen as task dependent. In other words, how and why a user searches is
directly related to what type of activity he is engaged in. Therefore
search solutions must be designed around specific business problems
that provide meaningful value to the enterprise. Users have been
trained by Google to expect search results with lightning speed. They
also want high precision without a personal investment in lengthy
exploratory research. In other words, they want information to come to
them no matter where they are or how they are connected to both the
intranet and the world outside the firewall. Indeed, these differences
are blurring more rapidly every day.
There is a cornucopia of new technologies available to help us reach
these goals. IT departments and system developers can choose to
implement company-wide authentication for seamless access to multiple
repositories, enterprise messaging busses for information services,
Semantic Web technologies for embedding relationships and
collaborative portals with personalisation designed by the user alone
or in teams as a natural outgrowth of work activities.
Capturing and leveraging user-generated metadata
Successful enterprise search today doesn't mean making keywords work
well. It means creating a holistic information architecture designed
for the enterprise that allows input and evolution by the users
themselves. Ironically, this usually relies on the time honored and
humble practice of generating metadata and controlled vocabularies
that enable data connectedness and intuitive recall. For years, we've
heard that users won't fill out metadata fields. Then how does one
account for the phenomenal success of Flickr? If one enters a set of
bookmarks in del.icio.us, doesn't that tell us something about the
person's interests and background? New Web 2.0 technologies generate
metadata in the wild that can be domesticated if we are wily enough to
recognise the opportunity.
A revitalised corporate IT environment should provide a common entry
point to multiple repositories with single sign-on capability, user
qualification awareness, and a simplified interface. Metadata about
people can be reconciled with metadata about objects and process to
facilitate personalised content delivery. Knowing an employee's
department and role implies something about the tasks associated with
that employee. Relevant applications, syndicated feeds and better
portlet integration enable customisation of activities and
transactions needed by employees. Data should be available without
regard to device or location thereby setting the stage for recall in
handheld devices or mobile units.
The corporate information environment should be available to access by
machines as well as individuals and utilise a common data reference
model for improved data consistency. Federated searches, contextual
results and composite data sets are all possible. Using new tools,
users can enter metadata right into the browser which can be displayed
by tag clouds and saved in a personal portlet. Search can be saved for
individual or team use and subscribed to as an ongoing service.
Graphical representations of results in charts or plots are a personal
choice. Browsing by image, video clips or text are now interrelated
and can be presented together for wider access by the user.
Foundation pieces and strategic approaches
In order to achieve the seamless integration of data to build our
brave new world, a semantic layer that handles data reconciliation and
unification of content sources is needed. Most experts recommend
starting by understanding the business uses of content and creating a
semantic representation of the target data that allows for
recombination and presentation in a variety of outlets. The
representation of enterprise data is expressed by the enterprise
metadata specification and its associated taxonomy. One of the
foundation pieces of the search team is to work with engineering
system owners to see that the metadata core specification is
incorporated into the searchable index. Working with system owners to
coordinate data values can be phased over time. Early phases include
mapping data fields to the enterprise standard in order to give
systems time to adopt standards. Opportunities for systems to
incorporate the standards arise when there is a major upgrade of the
system or replacement of the system's technology.
Content resides in many places and in many formats. Unstructured data
may be appropriate for natural language processing and entity data
extraction that facilitate automated tagging. Folksonomies and tag
clouds are examples of human tagging. The proposed solution set for an
enterprise search task should encompass both these approaches. Objects
will be tagged over time through both automated and human actions
using the concepts around the Unstructured Information Management
Architecture (UIMA).
Instead of implementing a Web crawler to randomly generate search
results on arbitrary key- words, the approach of the modern enterprise
search team is to leverage a strong information architecture
infrastructure resulting in a unification layer for enterprise
content. By utilising enterprise metadata standards, deploying
reconciliation strategies with gold source vocabularies and building a
clearinghouse for data collection, order can be brought to a chaotic
information environment.
The ultimate goal is an information environment enhanced by metadata
and served up through a number of rich user interactions facilitated
by role based access. Unified enterprise search at my organisation is
conceived of as a set of integrated systems utilising different types
of technologies to provide information quickly and represented with a
variety of visualisation techniques including charts, sliders for
query definition, and thumbnails of engineering drawing families.
There are numerous benefits for the enterprise, from better
information re-use including a higher percentage of winning proposals,
shorter product development time, more effective resource management,
better decision making and improved business agility. These benefits
combine to make a stronger competitor in the marketplace and generate
more success in the long run. That's a business case our managers
can't afford to ignore.
The research described in this (publication or paper) was carried out at the Jet Propulsion Laboratory, under a contract with the National Aeronautics and Space Administration.
Jayne Dutra has worked at the Jet Propulsion Laboratory for the last
10 years, managing software development tasks in the areas of Web
content management, search and portals. Her experience led her to
believe that no enterprise search effort would be truly successful
without a foundation layer of information architecture and
standardised metadata, and she became interested in taxonomies. She
subsequently worked on a Project Engineering Taxonomy for JPL space
exploration teams and the development of the JPL Business Domain
Taxonomy. Jayne currently serves as the Lead Enterprise Information
Architect for JPL.
Related FreePint links:
Click here for copyright permissions!
Copyright 2008 Free Pint Ltd.
You may also be interested in:
|