Close

Presentation

You do not have the proper registration to view this content. This content is available for: Tech Program Reg Pass, Exhibit Hall Only. Upgrade your registration here.
Starfish: Metadata-Driven Lifecycle Management, Analytics and Orchestration for File Systems and Object Stores at Scale
DescriptionStarfish is a unique software application delivering a holistic approach to unstructured data management at scale. Starfish is based on a simple paradigm that is incredibly versatile and powerful. We combine DISCOVERY with EXECUTION, enabling one to identify files that meet very specific criteria and then doing anything to those files that can be expressed in code.

The discovery component of Starfish is a data catalog for unstructured data. It indexes file systems and object stores, storing their metadata in an SQL database. Starfish uses a number of strategies to scan and maintain the index on extremely large files systems, such as those found in DOE labs, major universities, corporate R&D facilities, hedge funds, etc. The catalog retains the file system history, tracking files and directories that have moved or been deleted. It stores metadata in the form of tags and key-value pairs to enable more specific queries and to keep track of operations performed on individual files.

The execution portion of Starfish is a scale-out data mover and batch processor that uses the output of a query as the input of a batch job. Starfish has built-in commands for data movement, disposition, metadata extraction, hash calculations and many other common operations. It also executes scripts written in any popular language. Jobs are automatically run in parallel across multiple threads and servers.

Starfish is commonly for reporting and analytics, archiving, backup, ROT cleanup, instrumentation workflows and data curation workflows. The web portal allows individual users to participate in storage management.
Event Type
Exhibitor Forum
TimeThu, 18 Nov1:30pm - 2pm CST
Location263
Registration Categories
TP
XO / EX
Tags
Data Management
Extreme Scale Comptuing