| 
This work was part of my Masters thesis at New York University.
The sections below are excepts from a larger paper. If you are interested
in reading the full document, please email
me.
Introduction
Recent advances in storage technology allow users
to amass large quantities of documents. Other trends in consumer
electronics and the Internet have lead to a proliferation of digital
photographs, movies and music. Furthermore, users must also manage
documents they do not create, such as those received by email or
downloaded from the Internet. These factors lead to an organizational
overload that is a tremendous burden on computer users, and in particular,
presents a significant obstacle for continued and effective use
by novice users [4,7,9].
Hierarchical filing systems, on which the digital
equivalents are based, are widely used. However, it is not clear
that this method of organization is optimal. Computer systems allow
us to break away from limitations in the real world and provide
new, physically impossible and powerful ways to access media. To
better target our efforts in developing a new file navigation and
management paradigm, it was important to first consider inherent
drawbacks of hierarchical file systems. We identify three general
categories:
Organization and Overload
In order for hierarchical systems to be effective
management schemes, users must be diligent and spend time to organize
documents appropriately. If documents are not organized, or worse,
incorrectly arranged, the system can become more unwieldy than a
flat file system. Also, as directories become saturated with files,
users create sub-folders to partition documents into smaller and
more manageable sets. As users create and acquire additional files,
maintaining and navigating these increasingly deep organizational
structures becomes complicated and time-consuming.
Naming Ambiguity
Effective naming is vital for maintaining easy-to-navigate
hierarchies. The user, rather than remembering the entire hierarchical
structure, can scan over a list of directory names and choose the
one that is most applicable to the target document (e.g. resumes).
However, this system becomes clumsy when directories are poorly
or ambiguously named.
Users also rely on informative names to differentiate
between files. Inconsistent and vague naming leads to confusion,
potentially requiring several files to be opened before the correct
one is located, even within a single directory. Many desktop environments
remedy this problem for image files by generating thumbnails. Image
names, which are typically cryptic (e.g., IMG_1092.jpg), are no
longer essential because the thumbnails provide sufficient distinction.
However, current systems do not provide an effective way to see
the contents of other media, especially text, without opening them.
Versioning
Users create versions of documents for two reasons: they provide
a safety net for accidental modifications and offer a history of
documents for reference. Users sometimes rely on simple naming conventions,
such as dating, lettering or numbering (e.g., resume1.doc, resume2.doc).
However, this system requires users to be consistent and accurate
in the creation and naming of versions to be reliable.
Time-Centric File Organization
One way to alleviate the difficulties associated
with hierarchical filing systems is to avoid hierarchies altogether.
As noted by Rikimoto, personal activities are tightly coupled with
the flow of time, providing an obvious mechanism for automatic organization
[10]. Coincidentally, considerable temporal information is produced
as a byproduct of regular computer use - file creation and modification
times can be readily captured. Although these factors create an
immediately favorable platform on which to organize files, there
are also numerous universal cognitive abilities that can be leveraged
to great advantage.
Foremost, users have excellent memory for when,
roughly, documents were created or edited [2,5]. This makes a timeline,
where users can rapidly move back and forward through time and set
the temporal extent (i.e., the length of the time period), an obvious
navigational mechanism. Additionally, humans are particularly adept
at remembering the chronology of items [7]. Thus, during navigation,
other documents can serve as temporal signposts.
Moreover, a time-based visualization has the
natural ability to accentuate temporal relationships between files,
especially clusters of file edits or creations. For example, a series
of HTML documents and images created in close temporal proximity
might comprise work relating to a single website. Furthermore, photographs
(or movie clips) taken at roughly the same time are likely to have
been captured at the same location or event. A timeline is also
useful for projects with finite time spans, as users can simply
set the timeline’s extent and view all material created during
that period.
These clustering and ordering clues serve as
a temporal context, which has been shown to substantially improve
recall and file recognition accuracy [11]. Files that surround a
target document often reveal what the file is about, and how and
why it was created or modified. It should be noted that computer-aided
work is highly fragmented [8], which will break up continuous, project-level
file creations and modifications. However, interleaved documents
from other jobs often enhance the temporal context because humans
have excellent recollection for parallel tasks [2,5].
Previous research has noted that users do not
place an emphasis on old documents, although archiving can be useful
[1,3]. A timeline intrinsically supports this behavior by automatically
diminishing the presence of old files simply by rendering them in
the past. Recently accessed files, and the ones most likely to be
relevant, are located near the present.
Kronosphere

An overview of the Kronosphere interface:
A) content-driven search menu, B) buttons to quickly navigate
to common temporal extents (e.g., current day, month). C)
timeline visualization, D) scrollbar for seeking the timeline,
and E) file information pane. |
Our
investigation of hierarchical file systems and time-centric document
organization and navigation revealed several areas that have been
under-explored: 1) visualizing and highlighting temporal context,
2) file versioning, 3) keyword tagging, and 4) tightly integrating
time and content into a unified search mechanism. Although Kronosphere
boasts a comprehensive array of features, the system description
will primarily concentrate on elements that address these particular
issues.
Visualization
Kronosphere uses a timeline-based visualization.
Each time a file is saved to the system, either through creation
or modification, a new entry is created and displayed on a timeline.
Temporal distances between files are preserved visually. This method
emphasizes important temporal relationships between files. For example,
a cluster of quick edits or a period of downtime between two activities
would be readily identifiable. Furthermore, the timeline provides
an intuitive and unobtrusive versioning mechanism. Each time a file
is modified, a new instance of the document is attached to the timeline
(including content). This allows one to navigate the timeline and
see each modification, the earliest instance being the file’s
creation.
Another feature unique to Kronosphere is the
ability to view the timeline in a linear or exponential mode. The
linear view represents all time linearly, such that a unit of time
is represented by a fixed amount of space. Alternatively, the exponential
view scales time in a decaying manner, such that a unit of time
becomes smaller the further in the past it is located. This simple
feature has a nice effect: older files bunch up, while newer documents
are more spread out, allowing them to be more readily recognized.
The latter mode was developed under the assumption that newer files
and versions of files are often more relevant than older ones, a
view supported by previous research.
Keyword Tagging
Kronosphere allows users to attach keywords to
their documents. This enables users to create their own folksonomies,
tagging documents in a similar way to successful systems like del.icio.us
and Flickr . This feature, coupled with a search engine, provides
a powerful and flexible organization system. Additionally, the keywords
alleviate naming ambiguity, one of the key deficiencies seen in
hierarchical file systems. Even documents with similar content,
such as versions of the same file, can be tagged in a way to successfully
differentiate them.
However, expecting users to tag all of their
files is unrealistic. Thus, Kronosphere automatically generates
keywords for documents as they are added to the timeline. For text
documents, a text analytics package developed by Jeff Borden of
New York University is used to extract significant words, phrases
and entities. This is achieved through a combination of TF-IDF statistical
analysis (Term Frequency – Inverse Document Frequency) and
linguistic analysis (named entity extraction and part of speech
tagging). Other file types are supported as well, for example, mp3
files are tagged their ID3 tags (e.g., song, genre, artist, year)
and images are tagged with their primary colors.
Search
Desktop search interfaces are a popular and successful
extension to the desktop metaphor. Notable systems include Google’s
Desktop Search, released in 2005, and Apple’s Spotlight feature,
which débuted in MacOSX 10.4. Similarly, Kronosphere offers
a rich content-search interface, including the ability to execute
full-text searches. Kronosphere’s inclusion of an additional
dimension, time, only helps to refine the search and produce more
accurate results. When the result list is returned, our system’s
natural temporal clustering and ordering clues allows users to quickly
hone in on the desired file or particular version of a file. Additionally,
keywords from documents in the result set can be quickly added to
refine the search and further reduce the number of hits.
Content and metadata search features are primarily
offered through a menu located above the timeline (see Figure, Label
A). In addition to full text search, users can also search by keyword,
file name, and file type. Kronosphere also offers methods for accessing
document versions, including the ability to see the entire version
history of a file or to jump to the most recent version. Users can
also search for related content given a target file. This is achieved
by locating other documents that have similar tags. Additionally,
a prototype content-based image retrieval system was used to generate
image metadata that allowed the visual content to be searchable.
Specifically, given a target image, users could find images similar
in visual composition.
Interaction
In Kronosphere, users can move backwards and
forwards through time using a horizontal scroll bar located at the
bottom of the timeline (see Figure 6, Label D). Users can also click
on a document or in the whitespace between documents to center the
timeline and focus on the corresponding date.
The ability to change the temporal extent is
also critical to effective navigation of time-space. Kronosphere
provides several mechanisms: First, three buttons provide quick
access to commonly used periods – the current day, week, and
month (see Figure 6, Label B). Second, users can right-click a document
and select the extent of the surrounding time period. This ranges
from a minute to a month in duration, and allows users to quickly
focus on a particular period and access other files created and
modified around the same time. Third, the mouse can be used to control
the temporal extent. Double clicking not only centers the timeline
on the corresponding time, but also reduces the temporal extent
(analogous to zooming in). Lastly, the scroll wheel can be used
to zoom in and out of time as well.
Kronosphere limits the number of documents that
can be seen at any given time (typically set between 10 and 50).
The reason for this restriction is two fold. First, an abundance
of files will cause the timeline to become too cluttered to be useful.
Second, and most importantly, users have difficulty visually scanning
and mentally processing large quantities of files. Instead, the
interface encourages users to refine their search, either by using
temporal context clues to narrow the temporal extent (i.e., zooming
in) or using content clues, such as keywords, to add relevant terms
to the search query. This multi-dimensional and iterative search
approach rapidly reduces the number of possible matches in addition
to providing a wealth of information about potentially related items.
Architecture
In order to minimize impact, the current version
of the application is designed to run alongside the user’s
existing operating system, hierarchical file structure and applications.
The current Java-based version runs on Windows, Linux and MacOS.
Kronosphere is composed of three major components.
The client, which is the primary focus of this paper, provides a
thin, but rich interface in which users can search a central database.
This database can be local or remote; the latter affording users
the possibility to share files (and versions) collaboratively. The
final component is a daemon that monitors a user’s hierarchical
file system for changes. When a new file or modification to an existing
file is detected, the file is processed and a new record is created
in the database. Keywords are extracted during this process.
References

[1] Barreau, D. and Nardi, B. A. Finding and
reminding: file organization from the desktop. SIGCHI Bulletin 27,
3, 39-43, July 1995.
[2] Blanc-Brude, T. and Scapin, D. L. What do
people recall about their documents?: Implications for desktop search
tools. In Proceedings of the 12th international Conference on Intelligent
User Interfaces, pages 102-111. ACM Press, New York, NY, 2007.
[3] Fertig, S., Freeman, E., and Gelernter, D.
“Finding and reminding” reconsidered. SIGCHI Bulletin
28, 1, 66-69, January 1996.
[4] Freeman, E. and Gelernter, D. Lifestreams:
a storage model for personal data. SIGMOD Rec. 25, 1, 80-86, March
1996.
[5] Gonçalves, D. and Jorge, J. A. Describing
documents: what can users tell us? In Proceedings of the 9th international
Conference on Intelligent User Interfaces, pages 247-249. ACM Press,
New York, NY, 2004.
[6] Krishnan, A. and Jones, S. TimeSpace: activity-based
temporal visualisation of personal information spaces. Personal
Ubiquitous Computing. 9, 1, 46-65, January 2005.
[7] Lansdale. M. The psychology of personal information
management. Applied Ergonomics, 19, 1, 55-66, 1988.
[8] Mark, G., Gonzalez, V. M., and Harris, J.
No task left behind?: Examining the nature of fragmented work. In
Proceedings of the SIGCHI Conference on Human Factors in Computing
Systems, pages 321-330. ACM Press, New York, NY, 2005.
[9] Marsden, Gary and Cairns, D. Improving the
Usability of the Hierarchical File System. South African Computer
Journal, 32, 1, 69-78, 2004.
[10] Rekimoto, J. Time-machine computing: a time-centric
approach for the information environment. In Proceedings of the
12th Annual ACM Symposium on User interface Software and Technology,
pages 45-54. ACM Press, New York, NY, 1999.
[11] Soules, C. A. and Ganger, G. R. Connections:
using context to enhance file search. In Proc. of the Twentieth
ACM Symposium on Operating Systems Principles, pages 119-132. ACM
Press, New York, NY, 2005.
Go to Home Page
Go to Projects Page
|