转载

Hadoop0.20.2完全分布式安装和配置

Hadoop0.20.2完全分布式安装和配置

  • 三台linux虚拟机,搭建分布式环境,至少要三台虚拟机,才能展现分布式的效果,比如数据分布式存储

  • 给三台虚拟机联网并且是动态静态ip,具体操作可以参见 虚拟机配置静态ip

  • 三台虚拟机免秘钥登入, 具体操作可以参见 liunx集群实现免秘钥登入

  • 三台主机的配置如下

HOST IP SYSTEM TYPE(节点类型)
First 192.168.118.3 centos6.3 master
Second 192.168.118.4 centos6.3 slave
Third 192.168.118.5 centos6.3 slave

- 三台主机安装JDK, 具体操作可以参见 linux配置java环境

  • 建议在三个虚拟机上分别新建一个用户,用来运行hadoop

  • 主节点安装hadoop

    • 下载hadoop0.20.2安装包, 下载地址 hadoop安装包下载地址

    • 安装hadoop,复制hadoop安装包到用户目录,并用命令解压

    tar -zxvf hadoop0.20.2.tar.gz
    • 配置hadoop, 配置文件都在hadoop安装目录的conf目录下

    • 配置hadoop-env.sh, 具体如下

      
      # The java implementation to use. Required.
      
      export JAVA_HOME=/usr/local/java/jdk1.8.0

      ?

    • 配置core-site.xml

      <configuration>
      <property>
          <name>fs.default.name</name>
          <value>hdfs://First:9000</value>
      </property>
      </configuration>

      fs.default.name: 配置hdfs的地址和端口

      ?

    • 配置hdfs-site.xml,具体如下

      <configuration>
            <property>
                    <name>dfs.data.dir</name>
                    <value>/home/learn/hadoop/hadoop-data</value>
            </property>
            <property>
                    <name>dfs.name.dir</name>
                    <value>/home/learn/hadoop/hadoop-name</value>
            </property>
            <property>
                    <name>fs.checkpoint.dir</name>
                    <value>/home/learn/hadoop/hadoop-nameSecondary</value>
            </property>
            <property>
                    <name>dfs.replication</name>
                    <value>2</value>
            </property>
      </configuration>

      dfs.data.dir是hadoop数据节点存储块目录,事先必须存在此目录

      dfs.name.dr是hadoop存储名称节点元数据,事先必须存在此目录

      fs.checkpoint.dir与secondary namenode有关,事先必须存在此目录

      dfs.replication,副本系数

      ?

    • 配置mapred-site.xml

      <configuration>
      <property>
          <name>mapred.job.tracker</name>
          <value>First:9001</value>
      </property>
      </configuration>

      mapred.job.tracker:joabtracker的RPC服务器地址和端口

      ?

    • 配置master

      First

      ?

    • 配置slaves

      Second
      Third
  • 将配置好的hadoop目录,分别copy到从节点相应目录

    scp  /home/learn/hadoop learn@192.168.118.4:/home/learn/hadoop
    scp  /home/learn/hadoop learn@192.168.118.5:/home/learn/hadoop
  • 格式化HDFS(在主节点进行, 进入hadoop/bin目录执行命令)

    ./hadoop namenode -format
  • 启动所有节点(在主节点进行, 进入hadoop/bin目录执行命令)

    ./start-all.sh 
  • 成功结果(主节点)

    [learn@first bin]$ ./start-all.sh starting namenode, logging to /home/learn/hadoop/hadoop-0.20.2/bin/../logs/hadoop-learn-namenode-first.out Second: starting datanode, logging to /home/learn/hadoop/hadoop-0.20.2/bin/../logs/hadoop-learn-datanode-second.out Third: starting datanode, logging to /home/learn/hadoop/hadoop-0.20.2/bin/../logs/hadoop-learn-datanode-third.out First: starting secondarynamenode, logging to /home/learn/hadoop/hadoop-0.20.2/bin/../logs/hadoop-learn-secondarynamenode-first.out starting jobtracker, logging to /home/learn/hadoop/hadoop-0.20.2/bin/../logs/hadoop-learn-jobtracker-first.out Second: starting tasktracker, logging to /home/learn/hadoop/hadoop-0.20.2/bin/../logs/hadoop-learn-tasktracker-second.out Third: starting tasktracker, logging to /home/learn/hadoop/hadoop-0.20.2/bin/../logs/hadoop-learn-tasktracker-third.out [learn@first bin]$ jps 4753 Jps 4614 SecondaryNameNode 4463 NameNode 4687 JobTracker

    成功结果(从节点) 如果从节点没有这几个进程,可以是防火墙没有关闭

    [learn@second ~]$ jps
    4160 Jps
    4040 DataNode
    4122 TaskTracker
    [learn@second ~]$ 
    
    
    [learn@third ~]$ jps
    4071 DataNode
    4153 TaskTracker
    4191 Jps
    [learn@third ~]$ 
  • WordCount测试

    • 编写程序如下
    package org.apache.hadoop.examples;
    
    import java.io.IOException;
    import java.util.StringTokenizer;
    
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.Mapper;
    import org.apache.hadoop.mapreduce.Reducer;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    import org.apache.hadoop.util.GenericOptionsParser;
    
    public class WordCount {
    
        public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> {
    
            private final static IntWritable one = new IntWritable(1);
            private Text word = new Text();
            public void map(Object key, Text value, Context context) throws IOException, InterruptedException
            {
                String line = value.toString();
                StringTokenizer itr = new StringTokenizer(line);
                while (itr.hasMoreTokens())
                {
                    word.set(itr.nextToken().toLowerCase());
                    context.write(word, one);
                }
            }
        }
    
        public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
            private IntWritable result = new IntWritable();
    
            public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException
            {
                int sum = 0;
                for (IntWritable val : values)
                {
                    sum += val.get();
                }
                result.set(sum);
                context.write(key, new IntWritable(sum));
            }
        }
    
        public static void main(String[] args) throws Exception
        {
            Configuration conf = new Configuration();
            String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
            if (otherArgs.length != 2)
            {
                System.err.println("Usage: wordcount <in> <out>");
                System.exit(2);
            }
            Job job = new Job(conf, "word count");
            job.setJarByClass(WordCount.class);
            job.setMapperClass(TokenizerMapper.class);
            job.setCombinerClass(IntSumReducer.class);
            job.setReducerClass(IntSumReducer.class);
            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(IntWritable.class);
            FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
            FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
            System.exit(job.waitForCompletion(true) ? 0 : 1);
        }
    }
    • 将以上java程序打包成jar包

    • 创建Hadoop输入目录,将以上两个文件(one.txt, two.txt)copy到输入目录,删除输出目录(如果有的话)

    
    #具体命令
    
    ./hadoop fs -mkdir /input
    ./hadoop fs -put /home/learn/one.txt /input
    ./hadoop fs -put /home/learn/two.txt /input
    ./hadoop fs -rmr /output
    
    #one.txt
    
    Pig
    Hadoop
    OK
    Girl
    Boy
    Boy
    Hbase 
    Hive
    Hive
    Hbase
    Hive
    son
    bitch
    bitch
    
    #two.txt
    
    Pig
    Dog
    Boy
    Girl
    Hadoop
    son
    Hadoop
    bitch
    Hive 
    bitch
    Hbase
    Hive
    ZooKeeper
    Hbase
    Hive
    • 运行程序
    ./hadoop jar /home/learn/Hadoop.jar /input /output
    • 运行结果
    [learn@first bin]$ ./hadoop jar /home/learn/hadoop/Hadoop.jar /input /output 18/04/05 18:03:45 INFO input.FileInputFormat: Total input paths to process : 2
    18/04/05 18:03:45 INFO mapred.JobClient: Running job: job_201804051741_0001
    18/04/05 18:03:46 INFO mapred.JobClient:  map 0% reduce 0%
    18/04/05 18:03:55 INFO mapred.JobClient:  map 50% reduce 0%
    18/04/05 18:03:58 INFO mapred.JobClient:  map 100% reduce 0%
    18/04/05 18:04:07 INFO mapred.JobClient:  map 100% reduce 100%
    18/04/05 18:04:09 INFO mapred.JobClient: Job complete: job_201804051741_0001
    18/04/05 18:04:09 INFO mapred.JobClient: Counters: 17
    18/04/05 18:04:09 INFO mapred.JobClient:   Map-Reduce Framework
    18/04/05 18:04:09 INFO mapred.JobClient:     Combine output records=19
    18/04/05 18:04:09 INFO mapred.JobClient:     Spilled Records=38
    18/04/05 18:04:09 INFO mapred.JobClient:     Reduce input records=19
    18/04/05 18:04:09 INFO mapred.JobClient:     Reduce output records=11
    18/04/05 18:04:09 INFO mapred.JobClient:     Map input records=29
    18/04/05 18:04:09 INFO mapred.JobClient:     Map output records=29
    18/04/05 18:04:09 INFO mapred.JobClient:     Map output bytes=270
    18/04/05 18:04:09 INFO mapred.JobClient:     Reduce shuffle bytes=225
    18/04/05 18:04:09 INFO mapred.JobClient:     Combine input records=29
    18/04/05 18:04:09 INFO mapred.JobClient:     Reduce input groups=11
    18/04/05 18:04:09 INFO mapred.JobClient:   FileSystemCounters
    18/04/05 18:04:09 INFO mapred.JobClient:     HDFS_BYTES_READ=156
    18/04/05 18:04:09 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=508
    18/04/05 18:04:09 INFO mapred.JobClient:     FILE_BYTES_READ=219
    18/04/05 18:04:09 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=80
    18/04/05 18:04:09 INFO mapred.JobClient:   Job Counters 
    18/04/05 18:04:09 INFO mapred.JobClient:     Launched map tasks=2
    18/04/05 18:04:09 INFO mapred.JobClient:     Launched reduce tasks=1
    18/04/05 18:04:09 INFO mapred.JobClient:     Data-local map tasks=2
    • 运行结果文件 /output/part-r-00000
    [learn@first bin]$ ./hadoop fs -cat   /output/part-r-00000
    bitch   4
    boy 3
    dog 1
    girl    2
    hadoop  3
    hbase   4
    hive    6
    ok  1
    pig 2
    son 2
    zookeeper   1

    ?

    ?

正文到此结束
本文目录