INNODATA ISOGEN                     CASE STUDY


The U.S. Exploring Expedition of the Pacific, led by
Captain Charles Wilkes from 1838 to 1842, produced
a veritable ocean of data. The leading scientists and
artists of the day sailed on a mission to collect
preserve and document anything of value to natural
historians throughout the Pacific Ocean. They logged
volumes of notes and drawings, collecting nearly
2,400 anthropological artifacts and 50,000 plant

Crisscrossing the Pacific, Wilkes's expedition
established that Antarctica is a continent, mapped
South America's coast and the Columbia River basin,
charted several Pacific Island groups and researched
Hawaii volcanoes. The accuracy of the maps helped
guide U.S. forces in the Pacific to victory during World
War II.

Although the expedition has been largely forgotten,
the volume of data was staggering – five volumes of
narrative descriptions, 15 volumes of published
scientific and anthropological documents, plus four
additional volumes that had never been published. In
all, the Smithsonian received 1,600 pieces in 1858.
Now, the Smithsonian wanted to make these 160-year-
old records of flora, fauna, geogra¬phy and
meteorology available to modern researchers through
its Galaxy of Knowledge portal.


Because much of the vast collection required labor-
intensive transcription and document linking, the
Smithsonian knew that it needed to partner with an
offshore content services provider to create the digital
archive. Moreover, the documents needed to be
converted with a high level of accuracy to ensure the
material's usefulness to scholars. From that
perspective, the Smithsonian's decision to partner
with Innodata Isogen, a leader in digitizing content,
was a logical choice.


Each page of the printed volumes was scanned with
optical character recognition software and the text
files were checked against the original by the
Innodata Isogen team to ensure complete accuracy of
the conversion. The text files were then converted into
accessible XML data files. Data elements within the
pages were tagged and coded to allow the information
to be matched to the document type definition (DTD)
system that the Smithsonian has established for its on-
line content.

Photos of more than 2,000 artifacts and hundreds of
pages of draw¬ings and illustration plates were
digitized and tagged and coded using Smithsonian's
DTD system to allow the entire collection to be
searched with key words. Throughout the process,
Smithsonian scholars checked each page and
illustration for accuracy and to establish an order for
the on-line presentation of the collection. They
developed descriptions of the sailing vessels and the
600-plus crew of sailors and scientists from the
collection and other resources.

Provide scholars and the general public with ready
access to records from the first U.S.-sponsored
exploration of the globe


Partner with Innodata Isogen to create a virtual
library of interactive text and images on the Internet


Scientists and historians worldwide can mine the
data for hidden discoveries, and the Smithsonian's
stature grows in the emerging field of digital
Each month, more than three million people
visit the Smithsonian's Galaxy of Knowledge
portal, giving this oft-forgotten expedition the
public spotlight it deserves

When the site was launched in early 2004,
visitors to the site could read an overview of
the expedition, and then choose whether to
further explore the narrative texts, scientific
texts, plates or supplemental material and
resources. Narrative and scientific texts and
plates can be viewed as JPG files or as
print¬able PDF pages that are exact copies of
the originally published reports.

In addition, the supplemental material and
resources section contains photos of more
than 2,000 artifacts, powered by a search
engine that enables researchers to review the
entire collection through the use of key words.
For example, a herpetologist can compare the
salamanders of South America and Samoa as
easily as a geologist can study the differences
of rock strata from the Columbia River with
those of Antarctica.
IMPACT When completed after eight weeks,
the project put crumbling yellow pages once
off-limits to all but dedicated scholars just a
mouse-click away to all researchers via
computer. Each month, more than three million
people visit the Smithsonian's Galaxy of
Knowledge portal, giving this oft-forgotten
expedition the public spotlight it deserves.

Scientists can now compare 160-year-old
descriptions with current data to identify
changes in the flora, fauna, geology and
meteorology throughout the Pacific. Digitizing
the entire collection also ensures its
preservation and establishes protocols that will
help other Smithsonian archivists to create
similar virtual museums of its collection, which
can now be cross referenced to facilitate
multi-disciplinary research.

Moreover, the steady stream of online visitors
who visit the site to explore the expedition's
discoveries furthers the Smithsonian's
reputation as a leader in the effort to digitize
historical records.