Mr Samuel Goebert

CSCAN Network Research Student

Distributed preservation and synchronization of Open Data between untrusted machines

The Internet is changing from a web of documents to a web of data. Open data initiatives like Wikipedia, Internet Archive, Stack Exchange or OpenStreetMap have become important sources of global knowledge. While mainly built by volunteers the quality of the data has reached and in some cases exceeded proprietary offerings. Although open data is freely available and everybody is invited to contribute, update operations have to happen on the main database or will be ignored. The data is locked in a centralized architecture. Current efforts to archive open data is based on mirroring offside copies of the main database. This prevents the raw data binary stream from getting lost but the data is disconnected. The copies must be updated in regular intervals to reflect the state of the main database. Direct editing of a mirror is possible but creates a fork as update operations are not synchronized back to the main project. With many forks of the same data it becomes difficult to determine the leading fork that should be preserved and contributed to.
The MPhil stage will focus on the research of distributed systems and consents finding algorithms. Existing algorithms are not suited to synchronize data between untrusted machines. They assume that all parties have the same trust level and are known. The focus of the research at this stage will give consideration to to a decentral protocol that synchronizes data between untrusted machines. Synchronizing the data in a multi master database replication setup enables user contributions from multiple sources. This will lead to a specification of a peer-to-peer protocol that enables anonymous contributions in a formalized way.

Director of studies: Prof. Dr Bettina Harriehausen-Mühlbauer
Other supervisors: Prof. Dr Christoph Wentzel, Prof. Steven M Furnell

Conference papers

Towards A Unified OAI-PMH Registry
Goebert S, Harriehausen-Mühlbauer B, Furnell SM
Proceedings of the 11th IS&T Archiving Conference, pp97-100, ISBN: 978-0-89208-309-1, 2014
Decentralized Hosting and Preservation of Open Data
Goebert S, Harriehausen-Mühlbauer B, Wentzel C
Proceedings of the 10th IS&T Archiving Conference, pp264-269, ISBN: 978-1-63266-642-0, 2013
A non-proprietary RAID replacement for long term preservation systems
Goebert S, Sarti A
8th International Conference on Preservation of Digital Objects, November 1-4, 2011, Singapore, pp254-256, ISBN: 978-981-07-0441-4, 2011
