Hadoop

  • Launchpad Entry: server-o-hadoop

  • Created:

  • Contributors:

  • Packages affected:

Summary

Release Note

Rationale

User stories

As a Hadoop admin I download CDH3 packages from Cloudera and can easily run them on Ubuntu Natty on physical systems.

Assumptions

As long as the big picture of java is not sorted out the goal of building Hadoop CDH3 packages from sources (and thus available from main) is deferred.

Design

Focus on reviewing, testing sending improvements on Cloudera packages.

Implementation

See the Work Items list in the associated blueprint.

Test/Demo Plan

Unresolved issues

BoF agenda and discussion

UDS Natty discussion

Provide Cloudera Hadoop packages (CDH3) on Natty


User story:

As a Hadoop admin I download CDH3 packages from Cloudera and can easily run them on Ubuntu Natty on physical systems.

As long as the big picture of java is not sorted out the goal of building Hadoop CDH3 packages from sources (and thus available from main) is deferred.

Review/test/QA more CDH3 packages (pig, hive, hue, oozie, ...).

Integration with UEC installation service/logging/monitoring/configuration mgmt.

Scenario:
 * logging and monitoring for UEC:
   - push all the monitoring data into HDFS and provide map/reduce jobs to extract valuable metrics.
 * data mining and analysis
 * Provide Elastic MapReduce service in UEC.
 * Archiving.

Tuning for each workload: how many cpus, disks.

How to grow a cluster: split up namenode and jobtracker onto multiple systems.

How to tune a cluster for a workload?  Maybe that's part of the cluster management service.


Actions:
 * review other CDH packages for improvements in the user experience and Ubuntu integration:
    1. hbase
    2. pig
    3. hive
    4. hue
    5. oozie
    6. sqoop
 * review zookeeper patches and look which patches should be integrated in the Debian/Ubuntu packages.
 * file bugs, write patches and have them integrate in CDH3.
   * publish hadoop packages into a PPA and point Cloudera to it for integration.
 * integration with installation service (whenever that is ready).


CategorySpec

ServerTeam/Specs/Natty/Hadoop (last edited 2011-05-06 12:51:51 by business-89-133-214-82)