Crane 1.0.0 released!

Publishing is an integral part of the data analysis process. Whether it’s in the form of code, reports or technical documentation, at some point artifacts need to be shared. More often than not, such artifacts are confidential and their access needs to be properly secured. There exist solutions for at least some types of artifacts, but we needed one simple tool that could help us with all the use cases we encounter in our daily practice, built with modern technology and security standards.

Crane is a new open source product to host data science artifacts: data analysis reports, documentation sites, or packages and libraries. It is an integral part of our open source suite to build data science platforms and plays well with ShinyProxy and RDepot.

Crane has been designed to comply with the strictest industry regulations in terms of security and auditing and has been widely popular amongst our customers.

Why?

  • All of your data science artifacts are under strict authentication and authorization using modern protocols (OIDC)
  • Fine-grained access control is organized in an intuitive hierarchical tree
  • The artifacts can be pushed into Crane using an API (e.g. to automate report updates) or using a UI (for manual uploads)
  • Full audit logs are available to track operations on all files (e.g. for GxP purposes)
  • All configuration can be stored in Git and Crane fully supports infrastructure-as-code (IaC)

In the below sections, we dive deeper into the many features of Crane and examples of how Crane can be used.

High security and compliance requirements

Crane is designed with high security and compliance requirements in mind.

It provides declarative authorization rules ensuring that only authorized users can access the data.

app:
  repositories:
    protected_repository:
      read-access:
        users: [ jack, jeff ]
      write-access:
        groups: [ writer, author ]
    authentication_required_repository:
      read-access:
        any-authenticated-user: true
    publicly_available_repository:
      read-access:
        public: true

Authorization rules for write and read access are defined separately providing flexibility while being explicit. Crane also supports using Portable Operating System Interface (POSIX) Access Control List (ACL) to control access to specific files or directories in cases that require additional security.

Support for multiple storage backends

Crane currently supports multiple storage backends, including S3 and local file system. This allows users to store and access data at scale in the cloud or on-premises. Each repository can have its own storage location.

Hosting R and Python package repositories

Crane can be used to serve R and Python package repositories, both within a company or publicly accessible network. Because of the advanced access control, only users with the correct permissions can access a repository. The native R and Python clients guarantees easy installation of packages, such that security isn’t a burden for users.

Data science storage

As a data science storage solution, Crane stores all your data and only allows access by authorized users. The data can be accessed using the web UI or HTTP API, allowing users to directly browse and download the data.

In addition, the data can be accessed by data science applications as well. For example, in order to store the underlying data for a Shiny app. Usually, the Shiny app has direct access to all the data (for that specific app). However, sometimes different users can only access certain datasets, in this case, the Shiny app can use the identity of the user to download the data from Crane. This ensures that authorization to the data is verified by Crane (instead of the Shiny app) and solves a long-standing issue in the data science web app space.

Upload methods

Users can upload new data using the web UI or HTTP API.

Using the HTTP API, uploads can be automated in the context of automated report publishing using your favorite CI/CD technology or pipeline:

Logs for auditing

All access to data is logged in the (optional) audit log of Crane. This feature supports running Crane in qualified and validated environments for data storage and/or analysis.

Bring your own web UI

Crane allows users to customize the look and feel of the app by providing a minimal Web UI that doesn’t use CSS libraries. This way you can easily bring your own UI and use any CSS framework you’re most comfortable with.

Tested

To ensure Crane can perform in high security settings the code base has been tested using integration tests reaching a high code coverage of more than 70%.

Documentation and support

Full release notes can be found on the downloads page and updated documentation can be found on https://craneserver.net. As always community support on this new release is available at

https://support.openanalytics.eu

Don’t hesitate to send in questions or suggestions and have fun with Crane and friends!