terasaur - Gigabytes to Terabytes: A Hybrid Cloud Approach to Publishing Large Files
Cloud storage providers make it easy to store and share small files, but significant challenges still exist when publishing and distributing large files. terasaur addresses this gap and offers a platform for consumers, researchers, and institutions to manage and share files between 1 GB and 1 TB in size.
Storage capacity and file sizes have increased dramatically in recent years. Linux distribution DVD ISO images typically require over a gigabyte, high quality video files and virtual machine images reach 100's of gigabytes, and research data sets easily stretch into the terabytes. Such large files present a challenge for digital content management and data transfer. HTTP and FTP remain the standards for accessing information and downloading files. However, these protocols offer little protection against network interruptions or file corruption at the source.
terasaur is a Web-based file and data distribution platform targeting objects and collections between 1 GB and 1 TB in size. BitTorrent serves as the favored transfer method due to its handling of very large files, built-in stop/restart functionality, and wealth of open source clients. Researchers and individual consumers can use a normal BitTorrent client to download files.
Typical BitTorrent tools work well for consumers but do not meet the needs of large content publishers. terasaur enhances BitTorrent by making data objects discoverable, authoritative, and persistent. Users are guided to enter structured metadata to facilitate discovery. The terasaur discovery engine is based on the Dublin Core metadata standard. Content publishers can attach rich descriptions and classifications to objects in the system. The application supports local search and browse activities, as well as exposes metadata for indexing by search engines. RSS and Atom feeds facilitate continued engagement between content providers and their communities.
terasaur draws from social network and web of trust models for its content publication and quality control framework. This framework allows content owners to ensure the authenticity of material, as well as gives community members the ability to contribute feedback.
Finally, a BitTorrent server module (Seed Bank) enables institutions to easily plug into terasaur and share large collections. Objects in the Seed Bank persist as long as the owner wishes to make them available, perhaps indefinitely. File storage follows a hybrid cloud model. When publishing data into terasaur, a user selects the destination from a list of available Seed Banks. The list may include a Seed Bank housed in a local data center or a 3rd party Seed Bank hosted in a cloud environment. This allows a user to consider data management policies, as well as storage capacity and convenience when publishing files.