当前位置：首页 > news >正文

如何做介绍监控公司的网站做图表好看的网站

news 2025/10/17 6:36:12

如何做介绍监控公司的网站,做图表好看的网站,做网站其实不贵,ui中国设计官网文章目录官方WordCount源码MapReduce编程规范常用数据序列化类型WordCount案例实操上传至集群测试官方WordCount源码为了方便查阅，我们将相关文件下载到本地查看: 注：此处mapreduce-examples文件在/opt/module/hadoop-3.1.3/share/hadoop/mapreduce目…

文章目录

- 官方WordCount源码
- MapReduce编程规范
- 常用数据序列化类型
- WordCount案例实操
- 上传至集群测试

官方WordCount源码

为了方便查阅，我们将相关文件下载到本地查看:

注：此处mapreduce-examples文件在/opt/module/hadoop-3.1.3/share/hadoop/mapreduce目录下可以查找到

请添加图片描述

借助反编译工具查看源码:
可以看到MapReduce有大量的案例，我们找到想了解的WordCount案例如下:

请添加图片描述

可以看到，WordCount案例有驱动类、Map类、Reduce类。并且数据的类型是Hadoop自身封装的序列化类型(如Text类型对应于Java中的String类型,IntWritable类型对应于Java中的int类型)

MapReduce编程规范

用户编写的程序分成三个部分：Mapper、Reducer 和 Driver

1.Mapper-stage

(1) 用户自定义的Mapper要继承自己的父类
(2) Mapper的输入数据是KV对的形式(KV类型任意，通过泛型体现)
(3) Mapper中的业务逻辑写在map()方法中
(4) Mapper的输出数据是KV对的形式(KV类型任意，通过泛型体现)
(5) map()方法(MapTask进程)对每一个<K,V>调用一次

2.Reducer-stage

(1) 用户自定义的Reducer要继承自己的父类
(2) Reducer的输入数据类型对应Mapper的输出数据类型,也是KV
(3) Reducer中的业务逻辑写在reduce()方法中
(4) ReducTask进程对每一组相同K的<K,V>组只调用一次reduce()方法

3.Driver-stage

相当于YARN集群的客户端，用于提交我们整个程序到YARN集群，提交的是封装了MapReduce程序相关运行参数的job对象

常用数据序列化类型

请添加图片描述

可以看到除了String对应于Text类型外，其余Java数据类型的对应Hadoop类型均是在原来类型后添加Writable

WordCount案例实操

在给定文本文件中统计输出每一个单词出现的总次数

(1)输入数据

在这里插入图片描述

期望输出
在这里插入图片描述

2)需求分析

按照MapReduce编程规范，分别编写Mapper、Reducer、Driver

其中每一阶段的业务逻辑如下：

在这里插入图片描述

明确了每个阶段需要做的事情，接下来便可以准备搭环境和编写各阶段业务逻辑代码了!

3)环境准备

(1)创建新maven工程,命名为MapReduceDemo

(2)在pom.xml文件中添加如下依赖

<dependencies><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-client</artifactId><version>3.1.3</version></dependency><dependency><groupId>junit</groupId><artifactId>junit</artifactId><version>4.12</version></dependency><dependency><groupId>org.slf4j</groupId><artifactId>slf4j-log4j12</artifactId><version>1.7.30</version></dependency>
</dependencies>

在项目的src/main/resources目录下,新建一个文件,命名为"log4j.properties"，并在其中填入

log4j.rootLogger=INFO, stdout 
log4j.appender.stdout=org.apache.log4j.ConsoleAppender 
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout 
log4j.appender.stdout.layout.ConversionPattern=%d %p [%c] - %m%n 
log4j.appender.logfile=org.apache.log4j.FileAppender 
log4j.appender.logfile.File=target/spring.log 
log4j.appender.logfile.layout=org.apache.log4j.PatternLayout 
log4j.appender.logfile.layout.ConversionPattern=%d %p [%c] - %m%n

(3) 创建包名:com.root.mapreduce.wordcount

(1) 编写Mapper类

package com.root.mapreduce.wordcount;import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.junit.Test;import java.io.IOException;/*** KEYIN,map时输入的key的类型 ：LongWritable* VALUEIN,map时输入value类型:Test* KEYOUT,map时输出的Key类型：Test* VALUEOUT,map时输出的value类型：IntWritable*/
public class WordCountMapper extends Mapper<LongWritable,Text,Text, IntWritable> {private Text outK=new Text();private IntWritable outV=new IntWritable(1);@Overrideprotected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {//思考：如果new Text的位置在这里，性能还是会低，因为map方法会被调用n次(这取决于源文件有多少行),每次都要new一个新的会浪费空间//1.获取一行String s = value.toString();//2.切割String[] s1 = s.split(" ");//3.循环写出for (String s2 : s1) {//思考：如果这里new了一个Text，那么性能会下降，因为如果读取的一行有很多数据那么每次for循环都要new一个Test，性能极度下降//封装outKoutK.set(s2);//写出context.write(outK,outV);}}
}

注：这里我们把一个Text类型的变量outK和一个IntWritable类型的变量outV定义为Mapper类的成员变量，提升一部分性能。(避免每次循环都new一个新对象)

(2) 编写Reducer类

package com.root.mapreduce.wordcount;import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;import java.io.IOException;/*** KEYIN,reduce时输入的key的类型 ：Text* VALUEIN,reduce时输入value类型: IntWritable* KEYOUT,reduce时输出的Key类型：Test* VALUEOUT,reduce时输出的value类型：IntWritable*/
public class WordCountReducer extends Reducer<Text, IntWritable,Text,IntWritable> {private IntWritable outV=new IntWritable();@Overrideprotected void reduce(Text key, Iterable<IntWritable> values, Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {int sum=0;//atguigu,(1,1)//累加for (IntWritable value : values) {sum+=value.get();}outV.set(sum);//写出context.write(key,outV);}
}

注：这里同样我们把一个IntWritable类型的变量outV定义为Reducer类的成员变量，提升一部分性能。(避免每次循环都new一个新对象)

(3) 编写 Driver 驱动类

package com.root.mapreduce.wordcount;import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.yarn.webapp.hamlet2.Hamlet;import java.io.IOException;public class WordCountDriver {public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {//1.获取jobConfiguration conf = new Configuration();Job ins = Job.getInstance(conf);//2.设置jar包路径ins.setJarByClass(WordCountDriver.class);//3.关联mapper和reducerins.setMapperClass(WordCountMapper.class);ins.setReducerClass(WordCountReducer.class);//4.设置map输出的kv类型ins.setMapOutputKeyClass(Text.class);ins.setMapOutputValueClass(IntWritable.class);//5.设置最终输出的kv类型ins.setOutputKeyClass(Text.class);ins.setOutputValueClass(IntWritable.class);//6.设置输入路径和输出路径FileInputFormat.setInputPaths(ins, new Path("D:\\java_learning\\input"));FileOutputFormat.setOutputPath(ins, new Path("D:\\java_learning\\output\\output1"));//7.提交jobboolean result = ins.waitForCompletion(true);System.exit(result ? 0 : 1);}
}

运行main主函数，查看目标路径下的文件详情：

请添加图片描述

Editplus打开part-r-00000文件查看输出结果:

在这里插入图片描述

上传至集群测试

之前的步骤中，我们相当于是在本地Windows环境下运行得到的结果，实际生产中我们常常需要在集群中运行并测试，接下来我们便来看一下如何在集群上测试。

（1）用 maven 打 jar 包，需要添加的打包插件依赖

<build><plugins><plugin><artifactId>maven-compiler-plugin</artifactId><version>3.6.1</version><configuration><source>1.8</source><target>1.8</target></configuration></plugin><plugin><artifactId>maven-assembly-plugin</artifactId><configuration><descriptorRefs><descriptorRef>jar-with-dependencies</descriptorRef></descriptorRefs></configuration><executions><execution><id>make-assembly</id><phase>package</phase><goals><goal>single</goal></goals></execution></executions></plugin></plugins></build>

由于我们在集群上测试时，想要动态获取输入和输出的路径，因此我们需要把Driver程序中的如下部分做小部分修改，其中args[0]是我们之后XShell控制台输入的第一个路径参数,代表了输入路径;args[1]是XShell控制台输入的第二个路径参数,代表了输出路径:

在这里插入图片描述

之后便可以进行打包操作。

（2）将程序打成 jar 包

请添加图片描述

（3）修改不带依赖的 jar 包名称为 wc.jar，并拷贝该 jar 包到 Hadoop 集群的/opt/module/hadoop-3.1.3 路径。

注：这里可以直接从Windows环境拖动想上传的文件到XShell上，之后查看目录可以看到wc.jar已被上传

(4) 启动 Hadoop 集群

myhadoop.sh start

(5) 查看集群节点状态

jpsall

(5)执行WordCount程序

hadoop jar wc.jar com.root.mapreduce.wordcount.WordCountDriver /haha /output

在这里插入图片描述

注意这里我们输入路径是集群上的/haha路径下的文件,输出路径为/output (输出路径在程序执行前不允许有重名路径)

执行后查看/output下的文件是否存在及内容

在这里插入图片描述

可以看到文件内容是hello.txt经过统计之后的，与hello.txt相对应:

在这里插入图片描述

查看全文

http://www.yayakq.cn/news/955798/

京东的网站规划与建设市场分析百度人工电话多少号

网站模板免费下载云资源python基础教程答案

做公司网站源代码怎么写上海新闻

做化妆品网站主机屋网站搭建设置

网站建设今网科技迷糊娃 wordpress 主题

网站开发交流群汽车行业市场分析那个网站做的好

专做hip hop音乐的网站东方网络律师团队

网站做导航设计的作用是什么意思什么是网站html静态化

美食网站策划书上海进出口博览会

wordpress站群管理破解版如何做网页设计

个人网站建设价格表级a做爰片免费视网站看看

tp框架可以做网站吗云南中建西部建设有限公司网站

网站建设有哪些方法微信广告平台

门户网站建设专业创意wordpress主题

高州市网站建设网站cms系统哪个好用吗

镇江网站建设工作室2012版本wordpress

烟台房产网站建设网站建设石家庄

国外网站国内访问速度太原网站制作定制开发

做网站资源推荐网站设计分析

网站域名申请好了怎么建设网站胶州网站建设平台

珠海网站建设技术支持怎么做网站海报

昆明哪个公司做网站建设最好西安搬家公司收费标准

怎么做企业招聘网站台州网站建设惠店

有哪些是外国人做的网站吗如何看网站做没做推广

如何查找同行网站做的外链微网站策划方案

镇江网站排名公司公众号软文是什么意思

文章目录

官方WordCount源码

MapReduce编程规范

常用数据序列化类型

WordCount案例实操

上传至集群测试

相关文章：