Cerise Configuration¶

Introduction¶

Cerise takes configuration information from various sources, with some overriding others. This page describes the configuration files and what can be configured in them.

Main configuration file¶

The main configuration file is located at conf/config.yml, and contains general configuration information for the Cerise service in YAML format. It looks as follows:

database:
  file: run/cerise.db

logging:
  file: /var/log/cerise/cerise_backend.log
  level: INFO

pidfile: run/cerise_backend.pid

client-file-exchange:
  store-location-service: file:///tmp/cerise_files
  store-location-client: file:///tmp/cerise_files

rest-service:
  base-url: http://localhost:29593
  hostname: 127.0.0.1
  port: 29593

Cerise uses SQLite to persistently store the jobs that have been submitted to it. SQLite databases consist of a single file, the location of which is given by the file key under database.

Logging output is configured under the logging key. Make sure that the user that Cerise runs under has write access to the given path. If you want to log to /var/log without giving Cerise root rights, making the specified log file on beforehand and then giving ownership to the user Cerise runs under works well. Or you can make a subdirectory and give the user access to that.

The pidfile key specifies a path to a file into which Cerise’s process identifier (PID) is written. This can be used to shut down a running service, i.e. kill <pid> will cleanly shut down Cerise.

Under client-file-exchange, the means of communicating files between Cerise and its users is configured. Communication is done using a shared folder accessible to both the users and the Cerise service. If Cerise is running locally, both parties have access to the same file system, and see the shared folder in the same location. Thus, store-location-service and store-location-client both point to the same on-disk directory.

If the Cerise service does not share a file system with the client, then a directory on the Cerise server must be made available to the client, e.g. via WebDAV. In this case, client and service access the same directory using different URLs, e.g.

client-file-exchange:
  store-location-service: file:///home/webdav/files
  store-location-client: http://localhost:29593/files

The user is expected to submit references to files that start with the URL in store-location-client, Cerise will then fetch the corresponding files from the directory specified in store-location-service.

store-location-client can be overridden by specifying the environment variable CERISE_STORE_LOCATION_CLIENT. If you want to run multiple Cerise instances in containers, simultaneously, then you need to remap the ports on which they are available to avoid collisions. With this environment variable, the port can be easily injected into the container, removing the need to have a different image for each container. Cerise Client uses this functionality.

Finally, key rest-service has the hostname and port on which the REST service should listen, as well as the external URL on which it is available. If you want the service to be available to the outside world, this should be the IP address of the network adaptor to listen on, or 0.0.0.0 to listen on all adaptors. Note that a service running inside a Docker container needs to have 0.0.0.0 for it to be accessible from outside the container.

Since the service needs to pass URLs to the client sometimes, it needs to know at which URL it is available to the client. This is specified by base-url, which should contain the first part of the URL to the REST API, before the /jobs part. Alternatively, you can set the CERISE_BASE_URL environment variable to this value.

Compute resource configuration¶

Information on which compute resource to connect to, and how to transfer files and submit jobs to it, is stored separately from the main service configuration, to make it easier to create specialisations. Furthermore, to enable different users to use the same specialised Cerise installation (e.g. Docker image), credentials can be specified using environment variables. (Cerise Client uses the latter method.) If you are making a specialisation that is to be shared with others, do not put your credentials in this file!

Note: this file is somewhat outdated, but well be updated prior to the 1.0 release.

API configuration file¶

The API configuration file is located in api/config.yml, and has the following format:

compute-resource:
  credentials:
    username: None
    password: None
    certfile: None
    passphrase: None

  files:
    credentials:
      username: None
      password: None
      certfile: None
      passphrase: None

    protocol: local
    location: None
    path: /home/$CERISE_USERNAME/.cerise

  jobs:
    credentials:
      username: None
      password: None
      certfile: None
      passphrase: None

    protocol: local
    location: None
    scheduler: none

    queue-name: None      # cluster default
    slots-per-node: None  # cluster default
    cores-per-node: 32
    scheduler-options: None
    cwl-runner: $CERISE_API_FILES/cerise/cwltiny.py

  refresh: 10

This file describes the compute resource and how to connect to it. Under the files key, file access (staging) is configured, while the jobs key has settings on how to submit jobs. credentials, and keys username, password, certfile and passphrase occurring throughout, refer to credentials, and will be discussed below. Keys may be omitted if they are not needed, e.g. location may be omitted if protocol is local, in which case credentials may also me left out.

For file staging, a protocol, location and path may be specified. Supported protocols are file, sftp, ftp, or webdav, where file refers to direct access to the local file system.

location provides the host name to connect to; to run locally, this may be omitted or empty. path configures the remote directory where Cerise will put its files. It may contain the string $CERISE_USERNAME, which will be replaced with the user account name that the service is using. This is useful if you want to put Cerise’s files into the users home directory, e.g. /home/$CERISE_USERNAME/.cerise (which is the default value). Note that user’s home directories are not always in /home on compute clusters, so be sure to check this.

Job management is configured under the jobs key. Here too a protocol may be given, as well as a location, and a few other settings can be made.

For job management, the protocol can be local (default) or ssh. If the local protocol is selected, location is ignored, and jobs are run locally. For the ssh protocol, location is the name of the host, optionally followed by a colon and a port number (e.g. example.com:2222).

Jobs can be run directly or via a scheduler. To run jobs directly, either on the local machine or on some remote host via SSH, set the scheduler to none. Other valid values for scheduler are slurm, torque and gridengine to submit jobs to the respective job management system.

If jobs need to be sent to a particular queue, then you can pass the queue name using the corresponding option; if it is not specified, the default queue is used. If one or more of your steps start MPI jobs, then you may want to set the number of MPI slots per node via slots-per-node for better performance. If you need to specify additional scheduler options to e.g. select a GPU node, you can do so using e.g. scheduler-options: "-C TitanX --gres=gpu:1". Ideally, it would be possible to specify this in the CWL file for the step, but support for this in CWL is partial and in-development, and Cerise does not currently support this. Users can specify the number of cores to run on using a CWL ResourceRequirement, but Cerise always allocates whole nodes. It therefore needs to know the number of cores in each node, which you should specify using cores-per-node.

Finally, cwl-runner specifies the remote path to the CWL runner. It defaults to $CERISE_API_FILES/cerise/cwltiny.py, which is Cerise’s included simple CWL runner. $CERISE_API_FILES will be substituted for the appropriate remote directory by Cerise. See Specialising Cerise for more information.

Cerise will regularly poll the compute resource it is connected to, to check if any of the running jobs have finished. The refresh setting can be used to set the minimum interval in seconds between checks, so as to avoid putting too much load on the machine.

Credentials may be put into the configuration file as indicated. Valid combinations are:

No credentials at all (for running locally)
Only a username
A username and a password
A username and a certificate file
A username, a certificate file, and a passphrase

If the credentials to use for file access and job management are the same, then you should list them under credentials and omit them in the other locations. If different credentials are needed for files and jobs, then a credentials block can be specified under files and jobs respectively. Credentials listed here may be overridden by environment variables, as described below.

Environment variables¶

Cerise checks a set of environment variables for credentials. If found, they override the settings in the configuration file. These variables are:

General credentials

CERISE_USERNAME
CERISE_PASSWORD
CERISE_CERTFILE
CERISE_PASSPHRASE

Credentials for file access

CERISE_FILES_USERNAME
CERISE_FILES_PASSWORD
CERISE_FILES_CERTFILE
CERISE_FILES_PASSPHRASE

Credentials for job management

CERISE_JOBS_USERNAME
CERISE_JOBS_PASSWORD
CERISE_JOBS_CERTFILE
CERISE_JOBS_PASSPHRASE

As in the configuration file, specific credentials go before general ones. Cerise will first try a specific environment variable (e.g. CERISE_JOBS_USERNAME), then the corresponding specific configuration file entry (under jobs), then a generic environment variable (e.g. CERISE_USERNAME), and finally the generic configuration file entry (under credentials).

It does this for each of the four credential components separately, then uses the first complete combination from the top down to connect:

username + certfile + passphrase
username + certfile
username + password
username
<no credentials>