|
The National Science Foundation (NSF)
recently awarded Louisiana State University
professor Tevfik Kosar with a half-million
dollar grant to support development of the
Stork Data Scheduler.
As applications and experiments in all areas
of science are becoming increasingly complex
and more demanding in terms of their
computational and data requirements, some
applications generate data volumes reaching
hundreds of terabytes and even petabytes.
Sharing, disseminating, and analyzing these
petascale data sets becomes a big challenge
especially when distributed resources are
used. Even though many regional and national
optical networking initiatives such as LONI,
ESNet and Teragrid provide high speed
network connectivity to their users,
majority of the users still fail to obtain
even a fraction of the theoretical speeds
promised by these networks due to mismanaged
end-to-end data placement.
The traditional distributed computing
systems closely couple data placement and
computation. They consider data resources as
second class entities, and access to data as
a side effect of computation. This makes the
remote access and retrieval of data the main
bottleneck in the end-to-end performance,
reliability and automation of large-scale
data-intensive and dynamic data-driven
applications.
Kosar’s NSF project, funded through the
foundations Strategic Technologies for
Cyberinfrastrucutre (STCI) program, will
further develop and enhance the Stork data
scheduler to mitigate the end-to-end data
handling bottleneck in petascale distributed
computing systems and make it available for
a wide range of user community as a
production quality software.
The Stork Data Scheduler makes a distinctive
contribution to distributed computing
community because it focuses on planning,
scheduling, monitoring and management of
data movement tasks and data resources.
Unlike existing approaches, Stork treats
data resources and the tasks related to data
access and movement as first class entities
just like computational resources and
compute tasks, and not simply the side
effect of computation.
Kosar, who holds a joint faculty appointment
with the LSU Center for Computation &
Technology, or CCT, has also received an NSF
CAREER Award for his project titled
“Data-aware Distributed Computing for
Enabling Large-scale Collaborative Science.”
earlier this year.
The NSF CAREER Award is the foundation’s
most prestigious award for junior faculty
members. It is part of NSF’s Faculty Early
Career Development Program, which
“recognizes and supports the early
career-development activities of those
teacher-scholars who are most likely to
become the academic leaders of the 21st
century.” CAREER Award recipients are
selected on the basis of creative
career-development plans that effectively
integrate research and education within the
context of the missions of their
institutions.
Kosar’s CAREER grant establishes the
theoretical background for data scheduling
via development of novel mathematical models
and algorithms. The recent STCI grant takes
these models and algorithms, implements them
in a production quality scheduling software,
and makes them available for a wide range of
science community.
Enhanced functionalities of the Stork
scheduler will include: data aggregation and
caching; peer-to-peer and streamed data
management; early error detection,
classification, and recovery; job delegation
and distributed data scheduling; integration
with workflow planning and management;
scheduled storage management; optimal
protocol tuning; and end-to-end performance
prediction services.
The Stork data scheduler is considered a
highly transformative project due to its
potential to dramatically change how domain
scientists perform their research and
rapidly facilitate sharing of experience,
raw data, and results. Future applications
will be able to rely on Stork to manage
storage and data movement reliably and
transparently over a variety of storage and
transfer protocols, thus eliminating
unnecessary failure of distributed tasks.
In December 2008, Kosar led a team of
researchers who unveiled the first prototype
of the Stork Data Scheduler (v 1.0), and
made it available for the use of science
community (www.storkproject.org).
Data storage and management is Kosar’s
research specialty at the University. In
2006, he received a $1 million grant from
NSF to create advanced data archival,
processing and visualization capabilities
across the state through the PetaShare
project (www.petashare.org).
For Information please visit :
http://www.supercomputingonline.com/
|