ServerKarmicCloudClusterComputing

Summary

The elastic nature of cloud computing is interesting to some traditional cluster computing customers who have workloads that have tremendous spikes and valleys. When these people need computing resources, they need a lot of it (hundreds, potentially thousands of cores). But their workloads (often cpu-intensive models) do not need to run all the time (like a webserver, or mailserver). For these reasons, the pay-as-you-go and elastic nature of cloud computing is interesting.

Ubuntu should package several open source projects that provide cluster computing capabilities on top of cloud computing resources.

Release Note

This section should include a paragraph describing the end-user impact of this change. It is meant to be included in the release notes of the first release in which it is implemented. (Not all of these will actually be included in the release notes, at the release manager's discretion; but writing them is a useful exercise.)

It is mandatory.

Rationale

This should cover the _why_: why is this change being proposed, what justifies it, where we see this justified.

User stories

Assumptions

Design

You can have subsections that better describe specific parts of the issue.

Implementation

This section should describe a plan of action (the "how") to implement the changes discussed. Could include subsections like:

UI Changes

Should cover changes required to the UI, or specific UI that is required to implement this

Code Changes

Code changes should include an overview of what needs to change, and in some cases even the specific details.

Migration

Include:

  • data migration, if any
  • redirects from old URLs to new ones, if any
  • how users will be pointed to the new way of doing things, if necessary.

Test/Demo Plan

It's important that we are able to test new features, and demonstrate them to users. Use this section to describe a short plan that anybody can follow that demonstrates the feature is working. This can then be used during testing, and to show off after release. Please add an entry to http://testcases.qa.ubuntu.com/Coverage/NewFeatures for tracking test coverage.

This need not be added or completed until the specification is nearing beta.

Unresolved issues

This should highlight any issues that should be addressed in further specifications, and not problems with the specification itself; since any specification with problems cannot be approved.

UDS Raw Notes

  • Multiple nodes working together to complete a job.
  • MPICH2
    • Parallel programming library.
  • Condor
    • Allows installation of a process on multiple machines enabling users to submit small jobs which are then executed
      • on the best node.
    • Abstraction layer to combine multiple nodes into one?
  • High Performance computing and the cloud may not be a good fit because of high latency.
    • You don't know where your data is, or how close together the nodes are.
    • Adding more resources dynamically is an advantage for HPC clusters in the cloud.
  • Condor and Eucalyptus would compliment each other nicely for backfill processes.
    • Configure VMs to execute condor jobs when the load is light.into
  • Package Condor for Universe.
  • EGEE
    • Globus toolkit
      • Distributed computing.
    • Glite
      • Grid computing.
  • Clustered Samba
    • Could be great for Eucalyptus.
    • Provides high availability.
    • Need a clustered file system.
  • OpenPBS
    • Similar to LSF
    • Workload management software.
  • Torque
    • Resource Manager
  • Eucalyptus Clustered file systems
  • DBFS
    • High performance file system used in clusters.
  • Could use some type of billing log.
  • Might lose some efficiency with using virtual environment, but might be made up by resell features.
  • Cloud cluster could allow access sooner because users may not have to wait as long for resources.
  • Predictable spending patterns.
  • Instance availability might be an advantage.
    • But increasing the number of VMs may increase the risk of losing data should an instance die.
  • Having access to alternate compilers.
  • Running HPC apps in EC2 may be a crazy idea.
    • On the other hand the capacity of EC2 may be far higher than local cloud.
    • HPC on a local cloud may not be as bad.
  • There may be reliablilty issues with using EC2 cloud.
  • Advantages are check pointing and live migration.
  • An appliance with Torque, Clustered File System, etc would be great.
  • Network storage is very important to a local cloud.
  • Clustered File Systems
    • OCFS2
    • GFS
  • Network Block Devices


CategorySpec

ServerKarmicCloudClusterComputing (last edited 2009-06-03 18:50:05 by cpe-66-69-254-183)