Platform Development

This document provides a list of questions relevant when thinking about developing new platform (lab + compute) infrastructure.

Platform Scope

  • Who are the stakeholders (users, support, decision makers)?
  • What will the platform deliver - is there a written description?
    • Services
      • What does the service deliver ?
      • Will the platform operate a service on behalf of users or will users come to the facility and use it themselves or both ?
        • The answer to this question has operational impact at many levels (e.g. computer accounts).
      • Fee structure for service (including quality of service priorities)
      • Data/Analysis
      • Education and training
    • What are the measures of success for the platform (e.g. new capabilities, enhanced services, interoperability)?
  • What are the basic use-cases that can be used to drive the research computing needs?

Workflow

  • What is the logical workflow of the data (acquisition, storage (backup), curation, distribution, processing) ?
    • By understanding how the data will be consumed, you can drive what infrastructure is required where to handle it.
    • This helps drive an understanding of whether storage/compute could be provisioned offsite (e.g. by ITS).
    • See the section on Data Management below for more details on data
  • What are the access/security/privacy/ethics requirements for your data at all stages of the workflow?
    • It is very important to understand these at the beginning as they impact on possible technical solutions.
    • There is a perception that if data are stored off site, they are less secure. This is not necessarily the case and in fact the opposite may be true (e.g. high-level data centres are physically and logically very secure as they are managed by specialists).

Acquisition of Data

  • What acquisition computers will be required ?
    • Are they specialized (e.g. a console that comes with a Microscopy system) or generic ?
  • How are the acquisition systems to be supported ?
    • Vendor
    • Local IT
    • Scientist support
    • Is annual s/w or h/w maintenance required?
  • Do the data need pre-processing after initial acquisition (e.g. quality control)

Storage of Data

  • Will the data need to be stored locally (at least initially) with the instrument ?
    • A very high bandwidth connection may be needed between the instrument and initial storage (e.g. MR acquisitions)
    • Data may then be cached locally, and some or all of the data moved elsewhere (e.g. offsite)
  • Can the data be stored offsite ?
  • How much raw data will the facility acquire per annum ?
  • What level of availability to your data do you need? For example, is a very high-level of availability required for time-critical experiments?
  • Do you need to keep all of the data that you acquire - sometimes raw data are processed into end-user products and the raw data can be discarded.
  • Do you need backups of your data ?
    • Where?
    • How often?
    • Which data need to be backed up?

Processing/analysis

  • Will the platform process the raw data into processed data products?
  • How much additional data per annum will be produced in this way?
  • Where will the processed data be stored, does it need backup also?
  • Will you process data on behalf of end users?
  • Will end-users come to the facility to process data?
  • What computers do you need to process data ?
  • What software do you need to process data ?
  • What operating systems do you need to run the processing software ?
  • Are there any licensing issues with the processing software ?

Management of Data

  • Data Management is largely about long-lived processes for preserving and accessing data (some of the previous sections are included in this).
  • Data management also addresses issues to do with storing meta-data describing the data as well as the data
  • What value do you place on the data ? For example, can data be re-acquired if need be, or should data always be preserved.
  • Data Management Plans

Distribution of Data

  • What is the end point of the data ? E.g. end users, local staff access.
  • How will the data reach its end point (local access, download from portal, CD/DVD, network distribution)
  • Are there security questions to address regarding the distribution of data ?

Support

  • What skills do you need to support the computational infrastructure ?
  • Do you have these skills in your team or a team that you have access to ?
  • Do you have a resource identified to handle data operations ?
  • Do you need to develop software ?
    • If so, at what level - full system development, integration, deployment?
    • Do you need ongoing development/modification of the system?

Funding Planning

  • Storage and compute infrastructure
  • Capital Refresh
  • Hardware maintenance
  • Software maintenance
  • Software development/integration
  • Operations (including things to do with data)

-- NeilKilleen - 2011-06-30

Topic attachments
I Attachment Action Size Date Who Comment
Unknown file formatdocx dataManagementPlanning.docx manage 15.1 K 2011-07-05 - 04:15 NeilKilleen Data Management Planning Document (Andy Tseng)
Topic revision: r9 - 2011-07-07 - NeilKilleen
 
This site is powered by the TWiki collaboration platformCopyright © 2008-2012 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback