This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Documentation

This is the documentation for the ScienceBox project.

We recommend you to start reading the Overview.

If you find a typo or you want to contribute to the documentation go to Contribution Guidelines.

1 - Overview

How can ScienceBox help you?

What is it?

ScienceBox is an integrated software bundle with storage and computing services for general purpose and scientific use. It features container based version of distributed scalable storage, sync and share functionalities, and a web-based data analysis service, and can be deployed on a single machine or scaled-out across multiple servers leveraging modern technologies in Helm and Kubernetes.

Service Portfolio and Technology Stack

ScienceBox delivers an integrated set of services for storage, sync&share and data analysis and enables their deployment on a Kubernetes cluster using the Helm package manager:

  • EOS: Distributed storage service used at CERN to host all the Physics data and user files.
  • CERNBox: The sync and share platform for science that leverages EOS storage and provides cloud-based storage and sharing functionalities. CERNBox is based on the open-source ownCloud Infinite Scale software and expands it by providing tight integration with other services available at CERN.
  • SWAN: The Jupyter Notebook service at CERN that provides an advanced data analyis environment through a web interface. SWAN is built on top of the upstream Jupyter software by integrating storage from EOS.
  • CVMFS: The service for software distribution at a global scale adopted by the Worldwide LHC Computing Grid (WLCG) for the delivery of experiment analysis software.

Why do I want it?

  • What is it good for?: ScienceBox leverages modern technologies like Helm to package the services that could be installed on a kubernetes cluster with just one command, instead of writing yaml manifests.

  • What is it not good for?: ScienceBox is a all-in-one package to deploy CERN services, which comes with share of complexity, stacked services and inter-dependencies, if you want to just run SWAN, EOS or CERNBox separately/independently it would be better to look at other deployment options.

Where should I go next?

Head to the Getting Started section to get started with Sciencebox

2 - Getting Started

Install and Run ScienceBox

ScienceBox utilizes and leverages on the latest Cloud Native technologies to package and distribute the services that could be easily run on any cloud, be it on-premise or any commercial cloud. ScienceBox leverages on Helm to manage and install all kubernetes manifests. We uses the concept of “Umbrella Charts” i.e. Sciencebox expresses dependencies on various upstream sub-charts which makes ScienceBox highly pluggable and configurable. This provides users the flexibility to install and configure any sub-charts as they want.

There are two ways to install ScienceBox charts:

  • Minikube Installation
  • Production Installation on a Kubernetes Cluster

Before discussing the installation methods, there are certain pre-requisites that are needed to be installed:

Prerequisites

In order to install and run sciencebox on your kubernetes cluster, there are a set of tools and software needed to be installed. Sciencebox has been tested and developed on:

  • OS: CentOS 7.9 (Kernel version: 3.10), Ubuntu 20.04 (kernelt version: 5.13.0-35-generic)
  • Docker: 20.10.12
  • Kubernetes: 1.20.15
  • Helm: 3.8.0

Installation

Minikube installation

We provide a demonstrator version of ScienceBox, called mboxed that installs all the helm chart on minikube. Mboxed is a one click installation of ScienceBox. It can be considered as a self-contained, containerized demo for cloud storage and computing services for scientific and general-purpose use.

We have a dedicated repository which contains all the installation scripts and provides a single script to install all the workloads on a minikube based kubernetes cluster.

Follow simple steps to install ScienceBox on your cluster:

# clone the repo
$ git clone https://github.com/sciencebox/mboxed.git
$ cd mboxed

# install the required software
$ ./SetupInstall.sh

# Install sciencebox
$ ./ScienceBox.sh

After the installation, you can access your installation on https://${HOSTNAME}/sciencebox

Installing on Multi-node Kubernetes Cluster

ScienceBox can also be installed on a multi-node kubernetes cluster. Using HELM really simplifies the deployment process and it enables you to get your workloads up and running with just a matter of couple of helm commands. (This section assumes that you have all the pre-requisites already installed in your machine).

In order to install the sciencebox umbrella chart on your kubernetes cluster:

helm repo add sciencebox https://registry.cern.ch/chartrepo/sciencebox
helm install sciencebox/sciencebox

Please note that you do need to configure certain parameters before running the installation. The configurations for the parameters can be found here

Try it out!

After installation of the charts, the users can access ScienceBox on https://${HOSTNAME}/sciencebox, wherein the user would be welcomed with the welcome screen:

alt sciencebox welcome screen

The services can be accessed throught following URL:

  • Homepage: https://${HOSTNAME}/sciencebox
  • SWAN: https://${HOSTNAME}/swan
  • CERNBox: https://${HOSTNAME}

3 - Concepts

Deep dive into ScienceBox charts and sub-components

ScienceBox is a complex project with a lot of interdependencies and it can be duanting for an user or a potential contributor. This section tries to make life easier by describing various core concepts around the sciencebox project and how the solution was built from ground up.

3.1 - Architecture Reference

ScienceBox architecture and component reference

As already mentioned, ScienceBox is a software bundle packaged as a Helm Chart to deploy CERN IT services on Kubernetes. These services in itself are complex softwares that are deployed independently here at CERN. (Side note: not all of the services offered by ScienceBox run on Kubernetes natively at CERN)

ScienceBox was created so that all the services, namely CERNBox, EOS, CVMFS and SWAN could be deployed outside CERN with ease. Helm Charts proved to be the most hassle free way to ship all of these services in a package that could be easily deployed on kubernetes cluster and hence it was chosen to be the solution to package all the mentioned services. Along with the ease of deployment, HELM chart also proves to be highly configurable enabling one to configure the deployment as per their liking.

ScienceBox is a single helm chart that contains multiple subcharts, which in turn functions as a whole. As per the Helm community this practice is referred to as “Umbrella” Chart and is the de-facto standard to embed each component into a single package. The ScienceBox chart expresses dependencies on the CERNBox, EOS, and SWAN “sub-charts”. This can be easily visualized with the architecture below:

As seen in the above architecture, ScienceBox embeds all the individual components and configures them to run together. Along with all the major components, ScienceBox also requires some “satellite components” to glue all the services together. The detailed working of each service and the corresponding glue component is mentioned in their subsections.

To summarize, the ScienceBox umbrella consists of following sub-charts:

  • CERNBox Charts:
    • Revad Charts - Backbone of CERNBox, interoperability platform for sync and share systems.
      • 3 StorageProviders - Interface to EOS
      • AuthProvider - Revad Authentication service
    • CERNBox Web - Nginx server that serves CERNBox web.
    • OwnCloud Infinite Scale Charts - oCIS charts to run OCIS extenstions - IDP and Proxy. IDP - Identity Provider used for authentication.
  • EOS Charts:
    • 1 MGM - headnode of the cluster
    • 4 FST - storage daemons to write files’ payload
    • 3 QDB - highly available namespace and instance configuration
  • SWAN Charts
    • Fusex - EOS Client
    • JupyterHub - Upstream JupyterHub charts
  • CVMFS Charts

Satellite Components:

3.2 - CERNBox Charts

CERNBox charts and description

This section gives a brief of how CERNBox is functioning as a part of ScienceBox. Configuring and running CERNBox is a bit of a dauting task solely because of lot of microservices and satellite components running as a part of the deployment. This section hopes to simplify and make the understanding of CERNBox deployment much clearer.

The below architecture depicts the CERNBox deployment.

As seen in the architecture, in order to run CERNBox on kubernetes, there are many components involved:

  • OCIS Proxy: Web Proxy provided by ownCloud to incoming requests to REVA services.
  • OCIS IDP: OAuth provider by ownCloud - Backed by an LDAP Server.
  • CERNBox Web: CERNBox Web component.
  • Reva Services:
    • Storage Services: Public, User and Home services are the CERNBox storage services that interface with EOS.
    • Auth Service: Bearer service is the CERNBox authentication service.
  • MariaDB: Database to store all the cernbox share information.

All of the above described elements run as a kubernetes pods (deployment/statefulset) and interact with each other via kubernetes service mechanism.

3.3 - SWAN Charts

SWAN charts and description

This section gives a brief of how SWAN is used as a part of ScienceBox. Running SWAN as a part of ScienceBox is relatively easy task since we “mimic” the upstream SWAN deployment. The upstream documentation for SWAN could be found here. ScienceBox uses the upstream SWAN charts and configures it run with the custom OCIS IDP (which is also used by CERNBox for authentication) for authentication purpose.

As seen in the architecture, whenever there is a request at /swan endpoint the ingress routes the request to the running SWAN instance, which then uses OCIS IDP for the authentication and the EOS deployment as a storage backend. The components involved are:

  • OCIS IDP: OAuth Provider
  • EOS: Storage Provider
  • SWAN: Upstream SWAN charts

4 - Contribution Guidelines

How to contribute to ScienceBox

ScienceBox relies on Helm Charts to template, package and deploy all the Sciencebox services. Helm Chart helps one to define, install and upgrade Kubernetes application.

All submissions, including submissions by project members, require review. We use GitHub pull requests for this purpose. Consult GitHub Help for more information on using pull requests.

Quick start with ScienceBox

Here’s a quick guide to get started with Sciencebox. It assumes you’re familiar with the GitHub workflow:

  1. Fork the ScienceBox repo on GitHub.
  2. Make your changes and send a pull request (PR).
  3. If you’re not yet ready for a review, add “WIP” to the PR name to indicate it’s a work in progress.
  4. Wait for the automated PR workflow to do some checks.
  5. Continue updating your PR and pushing your changes until you’re happy with the content.
  6. When you’re ready for a review, add a comment to the PR, and remove any “WIP” markers.

Previewing your changes locally

If you want to run your own local Kubernetes cluster to preview your changes as you work: Note: We suggest you to use Minikube to run and test your services.

  1. Follow the instructions in Getting started to clone and install ScienceBox and the other pre-requisite tools.

  2. Clone the Mboxed repo:

    git clone https://github.com/sciencebox/mboxed.git
    
  3. Edit the etc/deploy.sh file in mboxed, to point the helm install command to the locally checked out Sciencebox charts.

  4. Run ./ScienceBox.sh to install the charts into your local kubernetes cluster for testing.

  5. Continue with the usual GitHub workflow to edit files, commit them, push the changes up to your fork, and create a pull request.

Creating an issue

If you’ve found a problem in the docs, but you’re not sure how to fix it yourself, please create an issue in the ScienceBox repo. You can also create an issue about a specific page by clicking the Create Issue button in the top right hand corner of the page.

Useful resources