发布日期:2015-12-07 10:32 来源: 标签: 云计算与大数据 Hadoop教程 Hadoop集群测试 海量数据
本章我们主要学习如何测试Hadoop集群?下面我们就做一下具体讲解,希望大家多多支持中国站长网络学院。
在hdfs上创建test1 文件夹,上传文件到此目录下  
[hadoop@TEST085 hadoop-0.20.203.0]$bin/hadoop fs -mkdir test1 
[hadoop@TEST085 hadoop-0.20.203.0]$ bin/hadoop fs -put ./README.txt test1 
[hadoop@TEST085 hadoop-0.20.203.0]$ bin/hadoop fs -ls 
Found 1 items 
drwxr-xr-x - hadoop supergroup        0 2011-07-21 19:58 /user/hadoop/test1 
运行一个map-reduce示例程序wordcount,运行结果如下: 
[hadoop@TEST085 hadoop-0.20.203.0]$ hadoop jar hadoop-examples-0.20.203.0.jar wordcount 
/user/hadoop/test1/README.txt output1 
结果如下: 
11/07/22 15:21:29 INFO input.FileInputFormat: Total input paths to process : 1 
11/07/22 15:21:30 INFO mapred.JobClient: Running job: job_201107221440_0001 
11/07/22 15:21:31 INFO mapred.JobClient: map 0% reduce 0% 
11/07/22 15:21:51 INFO mapred.JobClient: map 100% reduce 0% 
11/07/22 15:22:09 INFO mapred.JobClient: map 100% reduce 100% 
11/07/22 15:22:15 INFO mapred.JobClient: Job complete: job_201107221440_0001 
11/07/22 15:22:15 INFO mapred.JobClient: Counters: 25 
11/07/22 15:22:15 INFO mapred.JobClient: Job Counters 
11/07/22 15:22:15 INFO mapred.JobClient:        Launched reduce tasks=1 
11/07/22 15:22:15 INFO mapred.JobClient:        SLOTS_MILLIS_MAPS=18252 
11/07/22 15:22:15 INFO mapred.JobClient:        Total time spent by all reduces waiting after reserving 
slots (ms)=0 
11/07/22 15:22:15 INFO mapred.JobClient:        Total time spent by all maps waiting after reserving slots 
(ms)=0 
11/07/22 15:22:15 INFO mapred.JobClient:         Launched map tasks=1 
11/07/22 15:22:15 INFO mapred.JobClient:        Data-local map tasks=1 
11/07/22 15:22:15 INFO mapred.JobClient:        SLOTS_MILLIS_REDUCES=15479 
11/07/22 15:22:15 INFO mapred.JobClient:      File Output Format Counters 
11/07/22 15:22:15 INFO mapred.JobClient:        Bytes Written=1306 
11/07/22 15:22:15 INFO mapred.JobClient:      FileSystemCounters 
11/07/22 15:22:15 INFO mapred.JobClient:        FILE_BYTES_READ=1836 
11/07/22 15:22:15 INFO mapred.JobClient:        HDFS_BYTES_READ=1485 
11/07/22 15:22:15 INFO mapred.JobClient:        FILE_BYTES_WRITTEN=45989 
11/07/22 15:22:15 INFO mapred.JobClient:        HDFS_BYTES_WRITTEN=1306 
11/07/22 15:22:15 INFO mapred.JobClient:      File Input Format Counters 
11/07/22 15:22:15 INFO mapred.JobClient:        Bytes Read=1366 
11/07/22 15:22:15 INFO mapred.JobClient:      Map-Reduce Framework 
11/07/22 15:22:15 INFO mapred.JobClient:        Reduce input groups=131 
11/07/22 15:22:15 INFO mapred.JobClient:        Map output materialized bytes=1836 
11/07/22 15:22:15 INFO mapred.JobClient:        Combine output records=131 
11/07/22 15:22:15 INFO mapred.JobClient:        Map input records=31 
11/07/22 15:22:15 INFO mapred.JobClient:        Reduce shuffle bytes=1836 
11/07/22 15:22:15 INFO mapred.JobClient:        Reduce output records=131 
11/07/22 15:22:15 INFO mapred.JobClient:        Spilled Records=262 
11/07/22 15:22:15 INFO mapred.JobClient:        Map output bytes=2055 
11/07/22 15:22:15 INFO mapred.JobClient:        Combine input records=179 
11/07/22 15:22:15 INFO mapred.JobClient:        Map output records=179 
11/07/22 15:22:15 INFO mapred.JobClient:        SPLIT_RAW_BYTES=119 
11/07/22 15:22:15 INFO mapred.JobClient:        Reduce input records=131 
[hadoop@TEST085 hadoop-0.20.203.0]$ bin/hadoop fs -ls output1 
查看输出结果文件,这个文件在hdfs上: 
[hadoop@TEST085 hadoop-0.20.203.0]$ hadoop fs -ls output1 
Found 3 items 
-rw-r--r-- 3 hadoop supergroup              0 2011-07-22 15:22 /user/hadoop/output1/_SUCCESS 
drwxr-xr-x   - hadoop supergroup             0 2011-07-22 15:21 /user/hadoop/output1/_logs 
-rw-r--r-- 3 hadoop supergroup           1306 2011-07-22 15:22 /user/hadoop/output1/part-r-00000 
[hadoop@TEST085 hadoop-0.20.203.0]$ hadoop fs -cat output1/part-r-00000  
(BIS),          1  
(ECCN)          1  
(TSU)           1  
(see            1  
5D002.C.1,      1  
740.13) 1  
<http://www.wassenaar.org/>     1  
Administration         1  
Apache          1  
BEFORE          1  
BIS         1  
Bureau          1  
Commerce,       1  
...........省略

相关评论

专题信息
    Hadoop是Apache开源组织的一个分布式计算开源框架,在很多大型网站上都已经得到了应用,如亚马逊、Facebook和Yahoo等等。Hadoop框架中最核心的设计就是:MapReduce和HDFS。MapReduce的思想是由Google的一篇论文所提及而被广为流传的,简单的一句话解释MapReduce就是“任务的分解与结果的汇总”。HDFS是Hadoop分布式文件系统(Hadoop Distributed File System)的缩写,为分布式计算存储提供了底层支持。本教程对Hadoop做了详解的介绍与讲解,教程中的实例能让大家更快的学习Hadoop,望大家多多支持中国站长网络学院。