This document provides a list of questions relevant when thinking about developing new platform (lab + compute) infrastructure.
Platform Scope
Who are the stakeholders (users, support, decision makers)?
What will the platform deliver - is there a written description?
Services
What does the service deliver ?
Will the platform operate a service on behalf of users or will users come to the facility and use it themselves or both ?
The answer to this question has operational impact at many levels (e.g. computer accounts).
Fee structure for service (including quality of service priorities)
Data/Analysis
Education and training
What are the measures of success for the platform (e.g. new capabilities, enhanced services, interoperability)?
What are the basic use-cases that can be used to drive the research computing needs?
Workflow
What is the logical workflow of the data (acquisition, storage (backup), curation, distribution, processing) ?
By understanding how the data will be consumed, you can drive what infrastructure is required where to handle it.
This helps drive an understanding of whether storage/compute could be provisioned offsite (e.g. by ITS).
See the section on Data Management below for more details on data
What are the access/security/privacy/ethics requirements for your data at all stages of the workflow?
It is very important to understand these at the beginning as they impact on possible technical solutions.
There is a perception that if data are stored off site, they are less secure. This is not necessarily the case and in fact the opposite may be true (e.g. high-level data centres are physically and logically very secure as they are managed by specialists).
Acquisition of Data
What acquisition computers will be required ?
Are they specialized (e.g. a console that comes with a Microscopy system) or generic ?
How are the acquisition systems to be supported ?
Vendor
Local IT
Scientist support
Is annual s/w or h/w maintenance required?
Do the data need pre-processing after initial acquisition (e.g. quality control)
Storage of Data
Will the data need to be stored locally (at least initially) with the instrument ?
A very high bandwidth connection may be needed between the instrument and initial storage (e.g. MR acquisitions)
Data may then be cached locally, and some or all of the data moved elsewhere (e.g. offsite)
Can the data be stored offsite ?
How much raw data will the facility acquire per annum ?
What level of availability to your data do you need? For example, is a very high-level of availability required for time-critical experiments?
Do you need to keep all of the data that you acquire - sometimes raw data are processed into end-user products and the raw data can be discarded.
Do you need backups of your data ?
Where?
How often?
Which data need to be backed up?
Processing/analysis
Will the platform process the raw data into processed data products?
How much additional data per annum will be produced in this way?
Where will the processed data be stored, does it need backup also?
Will you process data on behalf of end users?
Will end-users come to the facility to process data?
What computers do you need to process data ?
What software do you need to process data ?
What operating systems do you need to run the processing software ?
Are there any licensing issues with the processing software ?
Management of Data
Data Management is largely about long-lived processes for preserving and accessing data (some of the previous sections are included in this).
Data management also addresses issues to do with storing meta-data describing the data as well as the data
What value do you place on the data ? For example, can data be re-acquired if need be, or should data always be preserved.