Fedora 4 Objectives January 6, 2015



Download 133.41 Kb.
Date09.07.2018
Size133.41 Kb.




Fedora 4 Objectives

January 6, 2015



  • Peter Eichman

  • Jennie Knies

  • Bria Parker

  • Ben Wallberg

Introduction

The purpose of this document is to outline our objectives for the development of a digital repository using Fedora 4.1 During this process we will put in place the framework to realize a flexible, extensible, durable, sustainable repository that will support both existing functionality and future improvements. We have three high-level objectives for the new repository.




  • Leverage repository improvements provided by Fedora 4 application

  • Migrate selected existing services and applications

  • Develop new features

Once the objectives have been vetted and agreed upon we will design a project roadmap that will address:




  • Establishing the development, testing, and acceptance process

  • Launching production Fedora 4 repository for new content

  • Migration of existing Fedora 2 content

  • Creation of batch loading processes and tools

  • Adoption of an administrative interface

  • Migration and extension of existing public interfaces

Leverage repository improvements provided by Fedora 4 application



Goals as stated in the Fedora Four Prospectus, https://wiki.duraspace.org/display/FF/Fedora+Four+Prospectus
The Fedora Four project will address the top priority requirements expressed by the international community over the past few years.  These will include:

  • improved performance, enhanced vertical and horizontal scalability

  • more flexible storage options

  • features to accommodate research data management

  • better capabilities for participating in the world of linked open data 

  • an improved platform for developers—one that is easier to work with and which will attract a larger core of developers.

These goals, once realized, will position Fedora as an effective platform for the next decade.
See Appendix A for a list of Fedora 4 features.

Migrate selected existing services and applications


Beyond the improvements provided by moving from Fedora 2 to Fedora 4 we need to ensure that the services and applications we build on top of the core repository maintain the same standards for performance and reliability.


Administrative Tools: The current Administrative Tools application is not robust and the original vision was never fully realized. Recommendation: Investigate replacing with community-developed and maintained application.
Mediated batch ingest: Currently, batch ingest into Fedora 2.2.2 requires complex scripts or intervention by DSS staff. Recommendation: Develop batch ingest mechanism that can be user-operated and perhaps work in conjunction more smoothly with the Administrative Tools.
Solr for search and reporting: Solr is a best-in-breed search and indexing tool with wide adoption. We will continue to use Solr with Fedora 4.
Descriptive and Administrative Metadata (UMDM and UMAM). Existing metadata schemas were created by UMD Libraries according to perceived needs in 2006. They have proven to be unnecessarily complicated, convoluted, and are no longer useful. Recommendation: Current schemas will be replaced with standard schemas, such as MODS and PREMIS.
Public Interface: Existing public interface is Hippo-based. Recommendation: Continue using Hippo-based public interface for foreseeable future.
Derivative Generation: Currently, for image-based objects, we create a viewable derivative using a tool called Zoomify. We also create thumbnails, load master and display copies. Recommendation: We will gather requirements for more advanced derivative generation and ensure that Fedora 4 will support those requirements.
Identifiers: We currently assign a number of unique and persistent identifiers to objects in Fedora. The PID is the persistent identifiers within Fedora 2.2.2. The Handle is an external persistent URL that can be mapped to whatever the existing internal structure might be. Recommendation: Fedora 4 has abandoned the PID as the unique identifier and instead uses a UUID. For migrated records, information about the PID will be retained. We will continue to use the Handle service for external links.
Exposure: We expose our data to external services via OAI-PMH. Recommendation: We will continue to do this, but also investigate other methods of providing access to our content, for example, via RDP REST API.

Develop new features

Serve as the primary repository for Libraries digital assets, such as those currently maintained in Digital Collections, as well as providing a gateway for services to both internally and externally managed content. Centralized services include inventory, reporting, and preservation (eg, gateway to APTrust)


Fedora 4 interface as first-rate service

Expose the RDF REST/HTTP API for external use with the primary audience being not just DSS developers but all Libraries and Libraries partners desiring to interface with the repository. This means that Authentication and Access Control must be handled at the repository level and not by a front-end application like Admin Tools, which means also that security implications must be fully considered. The repository interface must be fully documented, preferably available through the repository service itself.


Content Model

Content modeling in Fedora 4 will allow us to go beyond just modeling digital assets. We will be able to have entities (such as “University of Maryland) as Fedora objects as well and can leverage the RDF capabilities of Fedora 4 to relate assets and entities. 

Establish minimal Metadata requirements, which will be much lower than in current Fedora 2 based repository. Support ability to accession “raw” content with minimal user provided metadata, automatically generated metadata, and deferred creation of full metadata. Even the raw content will have first-rate discovery, delivery, and maintenance in the administrative interface.

Content Integrity

Update the repository using transactions to batch a series of update operations that should all succeed or all fail together.


It should be difficult, if possible at all, to delete an object such that it is not recoverable. Possible solutions: audit trail of all activity; leave behind tombstone metadata; make deletion a flag only; use ACL to allow deletion for a small set of admin users
Identify a small set of key indexes which must be updated synchronously with any content update; these will guarantee to be populated for any completed update operation. For performance reasons this set must be small enough to be quickly updated, otherwise system performance will be negatively affected.
Utilize provided fixity checking
TBD: Versioning

Identifiers

Migrate existing PIDs as an access point. Fedora 4 utilizes UUID as primary, unique identifier.


Migrate existing handles.

TBD: Methods of uniquely and persistently identifying content could include Handles, DOIs, CoolURIs, and RDF URI.


Ingest

Provide an improved ingest process with additional ingest paths. Eliminate need for developer mediation of batch loading. Create new batch loader with standardized input formats that can be run in a server-based automatic mode or a client side manual mode.


Create an independent validation service that can be initiated manually by the user to pre-validate a load batch or integrated with server side automated processing.
TBD: Deposit via SWORD protocol.

Reporting

Develop mechanisms for user generated reports.


Development Process

We will fork the core codebase, https://github.com/fcrepo4/fcrepo4, to use for our local development, for merging new features into our repository, and for submitting code contributions back to the core codebase.


In order to maximize productivity and collaborative work between existing and new development team members we will make a complete runtime environment available on the local, development workstation. Development of core repository features and ancillary applications will take place primarily in the local environment, with the ability to work completely offline.
Minimize the amount of custom code by using existing applications, libraries, and services. Participate in development of the core Fedora 4 code base as well as any other supporting code.
Follow Fedora 4 community best practices for test-driven development, automated testing, and continuous integration testing.
TBD: UMD specific API/Library built on top of standard LDP or Fedora 4 client.

Authorization / Access Control

Provide different levels of access to users based on their assigned roles. Fine grained, rule-based access control at the repository level with ability to override at the node level.


Appendix A

Fedora Commons Repository 4 features, collected from the Fedora Four wiki, https://wiki.duraspace.org/display/FF


Admin Search

Searching any entered properties is supported by Fedora 4 out-of-the-box.


Authorization

Fedora 4 authorization is designed to be fine grained, while at the same time manageable by administrators and end users. Authentication is tied to the servlet container or OAuth tokens, but authorization mechanisms are open to extension and many reference implementations are included. Roles-based access control is an included feature that makes permissions more manageable and at the same time easier for external applications to retrieve, index and enforce. Finer grained security checks have no impact on the performance of requests that have a Fedora administrator role.


Backup and Restore

The Fedora 4 Backup capability allows a user, such as the repository manager, make a REST call to have the repository binaries and metadata exported to the local file system. Inversely, the Restore capability allows a user to make a REST call to have the repository restored from the contents of a previous Backup.


External Search

To support the differing needs for sophisticated, rich searching, Fedora 4 comes with a standard mechanism and integration point for indexing content in an external service.  This could be a general search service such as Apache Solr or a standalone triplestore such as Sesame or Fuseki.


External Triplestore

RDF support is a core feature of Fedora 4, used as the primary data format for the REST API.  A triplestore is not bundled into the repository itself.  Instead, Fedora 4 sends events when the repository is updated, and the Indexer copies RDF from the repository to an external triplestore to keep it in sync with the repository.  This pattern, which is also used for search functionality for the same reasons, allows maximum flexibility about what triplestore to use, and removes the overhead of keeping the triplestore in sync from the core repository functionality.


Federation

To aid in ingest or to provide services for external content, Fedora 4 has the ability to expose that content as if it were included in the repository.  Federation may be useful for migrating content into Fedora 4 or serving large files already on disk.



Namespaces

Within Fedora 4, object and datastream properties may belong to any namespace providing semantic assertions that support interoperable metadata.


RESTful HTTP API

The Fedora 4 HTTP API is generally a RESTful API. HTTP methods like GET, PUT, POST and DELETE are implemented on most resource paths. The API also relies heavily on content negotiation to deliver context-appropriate responses, and a HATEOAS-driven text/html response (providing a decent GUI experience on top of the repository).


Transactions

Fedora 4 supports the ability to wrap multiple REST API calls into a single transaction that can be committed or rolled back as an atomic operation.


Versioning

Within Fedora 4, snapshots of the current state of an object may be saved into the version history.  The properties or content of a node saved in these versions may be accessed later to serve as a historical record of the object.  Future feature development may allow for easy export of the entire history or other useful actions.


Clustering

To support horizontal scalability use cases as well as geographic distribution, Fedora 4 can be configured as a cluster of application servers. Various configuration options are available for clusters, depending upon the the use case and server environment.


Identifiers

Identifiers can either be specified in REST API calls, automatically-generated using the internal PID minter, or generated using an external REST service.


Linked Data Platform

The W3C Linked Data Platform (LDP) specification describes a set of best practices and simple approach for a read-write Linked Data architecture, based on HTTP access to web resources that describe their state using the RDF data model. Fedora 4 implements the LDP specification for create, read, update and delete (CRUD), allowing HTTP, REST, and linked data clients to make requests to Fedora 4.





1 For information on the latest Fedora 4 release, see: http://duraspace.org/node/2394.

Rev. date: January 6, 2015



University of Maryland Libraries



Share with your friends:


The database is protected by copyright ©dentisty.org 2019
send message

    Main page