cerise.back_end package

Submodules

cerise.back_end.cwl module

cerise.back_end.cwl.get_cwltool_result(cwltool_log: str) → cerise.job_store.job_state.JobState[source]

Parses cwltool log output and returns a JobState object describing the outcome of the cwl execution.

Parameters:cwltool_log – The standard error output of cwltool
Returns:Any of JobState.PERMANENT_FAILURE, JobState.TEMPORARY_FAILURE or JobState.SUCCESS, or JobState.SYSTEM_ERROR if the output could not be interpreted.
cerise.back_end.cwl.get_files_from_binding(cwl_binding: Dict[str, Any]) → List[cerise.back_end.file.File][source]

Parses a CWL input or output binding an returns a list containing name: path pairs. Any non-File objects are omitted.

Parameters:cwl_binding – A dict structure parsed from a JSON CWL binding
Returns:
A list of File objects describing the input files described
in the binding.
cerise.back_end.cwl.get_required_num_cores(cwl_content: bytes) → int[source]

Takes a CWL file contents and extracts number of cores required.

Parameters:cwl_content – The contents of a CWL file.
Returns:The number of cores required, or 0 if not specified.
cerise.back_end.cwl.get_secondary_files(secondary_files: List[Dict[str, Any]]) → List[cerise.back_end.file.File][source]

Parses a list of secondary files, recursively.

Parameters:secondary_files – A list of values from a CWL secondaryFiles attribute.
Returns:A list of secondary input files.
cerise.back_end.cwl.get_time_limit(cwl_content: bytes) → int[source]

Takes a CWL file contents and extracts cwl1.1-dev1 time limit.

Supports only two of three possible ways of writing this. Returns 0 if no value was specified, in which case the default should be used.

Parameters:cwl_content – The contents of a CWL file.
Returns:Time to reserve in seconds.
cerise.back_end.cwl.get_workflow_step_names(workflow_content: bytes) → List[str][source]

Takes a CWL workflow and extracts names of steps.

This assumes that the steps are not inlined, but referenced by name, as we require for workflows submitted to Cerise. Also, this is not the name of the step in the workflow document, but the name of the step in the API to run. It’s the content of the run attribute, not that of the id attribute.

Parameters:workflow_content – The contents of the workflow file.
Returns:A list of step names.
cerise.back_end.cwl.is_workflow(workflow_content: bytes) → bool[source]

Takes CWL file contents and checks whether it is a CWL Workflow (and not an ExpressionTool or CommandLineTool).

Parameters:workflow_content – a dict structure parsed from a CWL file.
Returns:
True iff the top-level Process in this CWL file is an
instance of Workflow.

cerise.back_end.execution_manager module

class cerise.back_end.execution_manager.ExecutionManager(config: cerise.config.Config, local_api_dir: cerulean.path.Path)[source]

Bases: object

Handles the execution of jobs on the remote resource. The execution manager monitors the job store for files that are ready to be staged in, started, cancelled, staged out, or deleted, and performs the required activity. It also monitors the remote resource, ensuring that any remote state changes are propagated to the job store correctly.

Set up the execution manager.

Parameters:
  • config – The configuration.
  • local_api_dir – The path to the local API directory.
execute_jobs() → None[source]

Run the main backend execution loop.

This repeatedly processes jobs, but does not check the remote compute resource more often than specified in the remote_refresh configuration parameter.

shutdown() → None[source]

Requests the execution manager to execute a clean shutdown.

cerise.back_end.file module

class cerise.back_end.file.File(name: Optional[str], index: Optional[int], location: str, secondary_files: List[File])[source]

Bases: object

Create a File object.

This describes a file, and is the result of resolving input files from the user-submitted input description, or output generated by the CWL runner. It is used by the staging machinery to stage these files, and update the input description with remote paths.

Parameters:
  • name – The name of the input for which this file is.
  • index – The index of this file into an array of Files.
  • location – A URL with the (local) location of the file.
  • secondary_files – A list of secondary files.
index = None

The index of this file, if it is in an array of files.

location = None

Local URL of the file.

name = None

The input name for which this file is.

secondary_files = None

CWL secondary files.

source = None

The source of the file.

cerise.back_end.job_planner module

exception cerise.back_end.job_planner.InvalidJobError[source]

Bases: RuntimeError

class cerise.back_end.job_planner.JobPlanner(job_store: cerise.job_store.sqlite_job_store.SQLiteJobStore, local_api_dir: cerulean.path.Path)[source]

Bases: object

Handles workflow execution requirements.

This class keeps track of which hardware is needed for each available step, then analyses a workflow and decides which resources it needs based on this.

Create a JobPlanner.

Parameters:
  • job_store – The job store to act on.
  • local_api_dir – Path of local api directory.
plan_job(job_id: str) → None[source]

Figures out which resources a job needs.

Resources are identified by strings. Currently, there is num_cores, the number of cores to run on, and time_limit, the amount of time to reserve in seconds.

Parameters:job_id – Id of the job to plan.

cerise.back_end.job_runner module

class cerise.back_end.job_runner.JobRunner(job_store: cerise.job_store.sqlite_job_store.SQLiteJobStore, config: cerise.config.Config, remote_cwlrunner: str)[source]

Bases: object

Create a JobRunner object.

Parameters:
  • job_store – The job store to get jobs from.
  • config – The configuration.
  • remote_cwlrunner – The location of the CWL runner to use.
cancel_job(job_id: str) → bool[source]

Cancel a running job.

Job must be cancellable, i.e. in JobState.RUNNING or JobState.WAITING. If it isn’t cancellable, this function does nothing.

Cancellation may not happen immediately. If the cancellation request has been executed immediately and the job is now gone, this function returns False. If the job will be cancelled soon, it returns True.

Parameters:job_id – The id of the job to cancel.
Returns:Whether the job is still running.
start_job(job_id: str) → None[source]

Get a job from the job store and start it on the compute resource.

Parameters:job_id – The id of the job to start.
update_job(job_id: str) → None[source]

Get status from compute resource and update store.

Parameters:job_id – ID of the job to get the status of.

cerise.back_end.local_files module

class cerise.back_end.local_files.LocalFiles(job_store: cerise.job_store.sqlite_job_store.SQLiteJobStore, config: cerise.config.Config)[source]

Bases: object

Create a LocalFiles object. Sets up local directory structure as well.

Parameters:
  • job_store – The job store to use
  • config – The configuration.
create_output_dir(job_id: str) → None[source]

Create an output directory for a job.

Parameters:job_id – The id of the job to make a work directory for.
delete_output_dir(job_id: str) → None[source]

Delete the output directory for a job. This will remove the directory and everything in it.

Parameters:job_id – The id of the job whose output directory to delete.
publish_job_output(job_id: str, output_files: List[cerise.back_end.file.File]) → None[source]

Write output files to the local output dir for this job.

Uses the .output_files property of the job to get data, and updates its .output property with URLs pointing to the newly published files, then sets .output_files to None.

Parameters:
  • job_id – The id of the job whose output to publish.
  • output_files – List of output files to publish.
resolve_input(job_id: str) → List[cerise.back_end.file.File][source]

Resolves input (workflow and input files) for a job.

This function will read the job from the database, add a .workflow_content attribute with the contents of the workflow, and return a list of File objects describing the input files.

This function will accept local file:// URLs as well as remote http:// URLs.

Parameters:job_id – The id of the job whose input to resolve.
Returns:A list of File objects to stage.
resolve_secondary_files(secondary_files: List[cerise.back_end.file.File]) → None[source]

Makes a File object for each secondary file.

Works recursively, so nested secondaryFiles work.

Parameters:secondary_files – List of secondary files.
Returns:Resulting Files, with contents.

cerise.back_end.remote_api module

class cerise.back_end.remote_api.RemoteApi(config: cerise.config.Config, local_api_dir: cerulean.path.Path)[source]

Bases: object

Manages the remote API installation.

This class manages the remote directories in which the CWL API is installed, which is <basedir>/api/

Within this, there is a directory per project, with entries

<project>/version <project>/steps/… <project>/files/… <project>/install.sh

Create a RemoteApiFiles object. Sets up remote directory structure as well, but refuses to create the top-level directory.

Parameters:
  • config – The configuration.
  • local_api_dir – The path to the local API dir to install from.
get_projects() → List[str][source]

Return names and versions of the installed projects.

Returns:
A list of strings, one for each project, with name and
version.
install() → None[source]

Install the API onto the compute resource.

Copies subdirectories steps/ and files/ of the given local api dir to the compute resource, copies files/ to the compute resource, and runs the install script.

translate_runner_location(runner_location: str) → str[source]

Perform macro substitution on CWL runner location.

This replaces $CERISE_API with the API base dir.

Parameters:runner_location (str) – Location of the runner as configured by the user.
Returns:(str) A remote path with variables substituted.
translate_workflow(workflow_content: bytes) → bytes[source]

Parse workflow content, check that it calls steps, and insert the location of the steps on the remote resource so that the remote runner can find them.

Also converts YAML to JSON, for cwltiny compatibility.

Parameters:workflow_content – The raw workflow data
Returns:The modified workflow data, serialised as JSON
update_available() → bool[source]

Returns whether the remote API is older than the local one.

Returns:True iff an update is available/required.

cerise.back_end.remote_job_files module

class cerise.back_end.remote_job_files.RemoteJobFiles(job_store: cerise.job_store.sqlite_job_store.SQLiteJobStore, config: cerise.config.Config)[source]

Bases: object

Manages a remote directory structure. Expects to be given a remote dir to work within. Inside this directory, it makes a jobs/ directory, and inside that there is a directory for every job.

Within each job directory are the following files:

  • jobs/<job_id>/name.txt contains the user-given name of the job
  • jobs/<job_id>/workflow.cwl contains the workflow to run
  • jobs/<job_id>/work/ contains input and output files, and is the working directory for the job.
  • jobs/<job_id>/stdout.txt is the standard output of the CWL runner
  • jobs/<job_id>/stderr.txt is the standard error of the CWL runner

Create a RemoteJobFiles object. Sets up remote directory structure as well, but refuses to create the top-level directory.

Parameters:
  • job_store – The job store to use.
  • config – The configuration.
delete_job(job_id: str) → None[source]

Remove the work directory for a job. This will remove the directory and everything in it, if it exists.

Parameters:job_id – The id of the job whose work directory to delete.
destage_job_output(job_id: str) → List[cerise.back_end.file.File][source]

Download results of the given job from the compute resource.

Parameters:job_id – The id of the job to download results of.
Returns:A list of (name, path, content) tuples.
stage_job(job_id: str, input_files: List[cerise.back_end.file.File], workflow_content: bytes) → None[source]

Stage a job. Copies any necessary files to the remote resource.

Parameters:
  • job_id – The id of the job to stage
  • input_files – A list of input files to stage.
  • workflow_content – Translated contents of the workflow to be run.
update_job(job_id: str) → None[source]

Get status from remote resource and update store.

Parameters:job_id – ID of the job to get the status of.

Module contents