Capturing and Accessing Complex Time-Sequenced Instrument Data With HDF5
The proliferation of sensors and other instruments introduces enormous challenges to data management. Even for a single event, incoming synchronized time-sequenced data can have many sources. The number of incoming data streams, as well as the types of data, can be large. In our work on flight test data applications, for instance, we must accommodate data from test aircraft, voice communications, video, ground, satellite tracking, and other sources. Virtually all of this data must be gathered, integrated, processed, visualized, and archived. Developing systems to manage these data sources is complex, time-consuming and expensive. Similar scenarios exist for many different applications, such as environmental monitoring, vehicle testing, and medicine. And very similar data management infrastructures are developed, over and over, to solve these problems. There is surprisingly little sharing of these infrastructure technologies even within an application area, let along across application domains, resulting in frequent and costly re-invention of the same technologies.

One key infrastructure component is that which organizes and access data in databases or files. Typically, each type of data has its own format, its own access library, and specialized codes for integrating, analyzing and visualizing the data. If a single format could be used for organizing most of this data in a standard, efficient way, and if software could be provided to work with that data, a great deal of the inefficiencies of roll-your-own technologies could be avoided, saving time and money, and inducing more and better software and tools for working with the data.

HDF5 was designed to meet this need. In a single package, HDF5 provides many of the capabilities that otherwise have to be developed from scratch. These include the ability store virtually any kind of scientific or engineering data and to mix any number of objects of different types in a single container, support for different access patterns, simplified data integration, datatype translation, fast I/O, and visualization and analysis software.

In a project with Boeing, HDF5 is being used as a standard for managing time-sequenced flight test data, as described above. In this proposal, we seek to build on that work, to further the usefulness of HDF5 as a data management technology for a broad range of instrumentation applications. Three capabilities have been identified as key to making this happen:

1. First, there needs to be a general data model and API for storing and accessing complex real-time data. With this model and API, applications could quickly be built to take advantages of HDF5 capabilities, and data could easily be moved from application to another, making data and applications sharable.
2. Second, there is a need for a set of general purpose indexing data structures, which would enable applications conveniently to organize, find and view records of interest. It would also enable applications to have alternate views of data records, and might even enable data records to be re-organized efficiently.
3. A third requirement is for a general purpose viewer for browsing complex test data in HDF5.

The proposed work will identify a range of applications that capture and access multi-modal time-sequenced data, and determine needs of these applications relative to input modes, data rates, and processing and archival activities. Based on this assessment, a prototype data model and API that addressed these requirements will be developed. The work will also investigate the use of index structures in the context of this data model. Finally, the HDF viewer (HDFView) will be adapted for viewing data stored according to this model.

Current status:
In consultation with a number of potential applications, a document describing potential requirements was produced, which can be found at HDF5_Prototype_Indexing_Requirements_1.0.pdf. These requirements were used to generate a scaled down data model for the prototype, including data structures and operationsto be available in the prototype. These are described in HDF5_Prototype_Indexing_Model_0.3.pdf.
 
Project Leads
Mike Folk, NCSA

Return to Projects list


SELS 0.7 released
Secure Email List Services (SELS) is an open source software for creating and developing secure email list services among user communities.
 
Strong community engagement strengthens cybersecurity research and development
NCASSR-supported exploratory research at NCSA and elsewhere has sparked additional external funding and development opportunities as well as successful deployment and adoption by users ranging from the defense sector to state law enforcement to the utilities industry.
 
NCASSR Collaborator Goes To Washington
Carl Gunter, a professor in the University of Illinois Department of Computer Science and a project lead on NCASSR-supported work involving adaptive, secure messaging, recently spoke to an audience of congressional staffers and lobbyists on Capitol Hill regarding ways to address a variety of critical cybersecurity issues in areas such as healthcare and energy distribution.