日期:2014-05-16  浏览次数:20465 次

Giraph源码分析 —— 启动ZooKeeper服务

说明:

(1) 实验环境.

   三台服务器:test165、test62、test63。test165同时是JobTracker和TaskTracker.

   测试例子:官网自带的SSSP程序,数据是自己模拟生成。

   运行命令:hadoop jar giraph-examples-1.0.0-for-hadoop-0.20.203.0-jar-with-dependencies.jar org.apache.giraph.GiraphRunner org.apache.giraph.examples.SimpleShortestPathsVertex -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip /user/giraph/SSSP -of org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op /user/giraph/output-sssp-debug-7 -w 5

(2). 为节约空间,下文中所有代码均为核心代码片段。

(3). core-site.xml中hadoop.tmp.dir的路径设为:/home/hadoop/hadooptmp

(4).写本文是多次调试完成的,故文中的JobID不一样,读者可理解为同一JobID.

(5). 后续文章也遵循上述规则。

1. org.apache.giraph.graph.GraphMapper类

 Giraph中自定义org.apache.giraph.graph.GraphMapper类来继承Hadoop中的 org.apache.hadoop.mapreduce.Mapper<Object,Object,Object,Object>类,覆写了setup()、map()、cleanup()和run()方法。GraphMapper类的说明如下:

This mapper that will execute the BSP graph tasks alloted to this worker. All tasks will be performed by calling the GraphTaskManager object managed by this GraphMapper wrapper classs. Since this mapper will not be passing data by key-value pairs through the MR framework, the Mapper parameter types are irrelevant, and set to Object type.

BSP的运算逻辑被封装在GraphMapper类中,其拥有一GraphTaskManager对象,用来管理Job的tasks。每个GraphMapper对象都相当于BSP中的一个计算节点(compute node)。

在GraphMapper类中的setup()方法中,创建GraphTaskManager对象并调用其setup()方法进行一些初始化工作。如下:

  @Override
  public void setup(Context context)
    throws IOException, InterruptedException {
    // Execute all Giraph-related role(s) assigned to this compute node.
    // Roles can include "master," "worker," "zookeeper," or . . . ?
    graphTaskManager = new GraphTaskManager<I, V, E, M>(context);
    graphTaskManager.setup(
      DistributedCache.getLocalCacheArchives(context.getConfiguration()));
  }
map()方法为空,因为所有操作都被封装在了GraphTaskManager类中。在run()方法中调用GraphTaskManager对象的execute()方法进行BSP迭代计算。

@Override
  public void run(Context context) throws IOException, InterruptedException {
    // Notify the master quicker if there is worker failure rather than
    // waiting for ZooKeeper to timeout and delete the ephemeral znodes
    try {
      setup(context);
      while (context.nextKeyValue()) {
        graphTaskManager.execute();
      }
      cleanup(context);
      // Checkstyle exception due to needing to dump ZooKeeper failure
    } catch (RuntimeException e) {
      graphTaskManager.zooKeeperCleanup();
      graphTaskManager.workerFailureCleanup();
    }
  }

2. org.apache.giraph.graph.GraphTaskManager 类

功能:The Giraph-specific business logic for a single BSP compute node in whatever underlying type of cluster our Giraph job will run on. Owning object will provide the glue into the underlying cluster framework and will call this object to perform Giraph work.

下面讲述setup()方法,代码如下。

 /**
   * Called by owner of this GraphTaskManager on each compute node
   * @param zkPathList the path to the ZK jars we need to run the job
   */
  public void setup(Path[] zkPathList) thro