酷玛网站建设网站公告怎么做
【README】
1.本文部分内容转自:
https://computingforgeeks.com/how-to-install-apache-hadoop-hbase-on-centos-7/
2.本文是在单机上安装hbase (仅用于学习交流);
【1】更新系统
因为 hadoop和hbase是动态的,为便于hbase能够最大限度访问系统资源和网络权限,安装hbase前先关闭 SELinux与防火墙;
sudo systemctl disable --now firewalld
sudo setenforce 0
sudo sed -i 's/^SELINUX=.*/SELINUX=permissive/g' /etc/selinux/config
cat /etc/selinux/config | grep SELINUX= | grep -v '#'更新系统(软件包)并重启
sudo yum -y install epel-release
sudo yum -y install vim wget curl bash-completion
sudo yum -y update
sudo reboot【2】安装java
sudo yum -y install java-1.8.0-openjdk java-1.8.0-openjdk-devel校验java版本
[root@centos202 ~]# java -version
java version "1.8.0_271"
Java(TM) SE Runtime Environment (build 1.8.0_271-b09)
Java HotSpot(TM) 64-Bit Server VM (build 25.271-b09, mixed mode)设置 JAVA_HOME 环境变量
cat <<EOF | sudo tee /etc/profile.d/hadoop_java.sh
export JAVA_HOME=\$(dirname \$(dirname \$(readlink \$(readlink \$(which javac)))))
export PATH=\$PATH:\$JAVA_HOME/bin
EOF更新 $PATH变量和设置
source /etc/profile.d/hadoop_java.sh[root@centos202 profile.d]# echo $JAVA_HOME
/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.312.b07-2.el8_5.x86_64【3】创建hadoop账号
创建独立的hadoop账号;
sudo adduser hadoop
passwd hadoop
sudo usermod -aG wheel hadoop生成ssh key用于免密登录
[root@centos202 ~]# sudo su - hadoop
[hadoop@centos202 ~]$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa): 
Created directory '/home/hadoop/.ssh'.
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:HuqG5V6O7Od64sQdmXUFedVNc/GEda4ujA+xuivxs2k hadoop@centos202
The key's randomart image is:
+---[RSA 3072]----+
|            .o.B@|
|            ..ooB|
|          . ..  o|
|         + .   . |
|        S .   .  |
|     .o+ o = .   |
|     ++o+ + o .  |
|    .+=+Eo o .   |
|     =OOO=  .    |
+----[SHA256]-----+把用户hadoop添加到ssh免密登录授权列表;
[hadoop@centos202 ~]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[hadoop@centos202 ~]$ chmod 0600 ~/.ssh/authorized_keys使用生成的ssh key登录本机
[hadoop@centos202 ~]$ ssh localhost
The authenticity of host 'localhost (::1)' can't be established.
ECDSA key fingerprint is SHA256:EdoFy44sWPaZHE6jgJCVGkbGKxK63ToPAP24sQ2Gj3Y.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
Last login: Sun Mar  5 03:56:07 2023【4】下载并安装hadoop
下载 hadoop, hadoop安装包参见 https://hadoop.apache.org/releases.html ;
方式1)可以用 wget直接下载
wget https://www-eu.apache.org/dist/hadoop/common/hadoop-$RELEASE/hadoop-$RELEASE.tar.gz方式2)利用代理下载到本地(window10),然后通过 rz 从windows传输到 centos(本文采用);
本文版本:hadoop-3.2.4.tar.gz
[root@centos202 hadoop]# ls -l
total 480832
-rwxrwxrwx. 1 root root 492368219 Jan 30 22:20 hadoop-3.2.4.tar.gz解压
tar -xzvf hadoop-3.2.4.tar.gz 
# 结果
[root@centos202 hadoop-3.2.4]# pwd
/usr/local/hadoop/hadoop-3.2.4
[root@centos202 hadoop-3.2.4]# ls
bin  etc  include  lib  libexec  LICENSE.txt  NOTICE.txt  README.txt  sbin  share把 hadoop家目录添加到 $PATH
cat <<EOF | sudo tee /etc/profile.d/hadoop_java.sh
export JAVA_HOME=\$(dirname $(dirname $(readlink $(readlink $(which javac)))))
export HADOOP_HOME=/usr/local/hadoop/hadoop-3.2.4
export HADOOP_HDFS_HOME=\$HADOOP_HOME
export HADOOP_MAPRED_HOME=\$HADOOP_HOME
export YARN_HOME=\$HADOOP_HOME
export HADOOP_COMMON_HOME=\$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=\$HADOOP_HOME/lib/native
export PATH=\$PATH:\$JAVA_HOME/bin:\$HADOOP_HOME/bin:\$HADOOP_HOME/sbin
EOFsource命令刷新当前执行环境,添加hadoop_java.sh文件中定义的环境变量;
source /etc/profile.d/hadoop_java.sh查看hadoop版本:
[root@centos202 hadoop-3.2.4]# hadoop version
Hadoop 3.2.4
Source code repository Unknown -r 7e5d9983b388e372fe640f21f048f2f2ae6e9eba
Compiled by ubuntu on 2022-07-12T11:58Z
Compiled with protoc 2.5.0
From source with checksum ee031c16fe785bbb35252c749418712
This command was run using /usr/local/hadoop/hadoop-3.2.4/share/hadoop/common/hadoop-common-3.2.4.jar【5】配置hadoop
所有hadoop的配置都在 /usr/local/hadoop/hadoop-3.2.4/etc/hadoop 目录下;
[root@centos202 hadoop]# pwd
/usr/local/hadoop/hadoop-3.2.4/etc/hadoop
[root@centos202 hadoop]# ls 
capacity-scheduler.xml      hadoop-policy.xml                 kms-acls.xml          mapred-queues.xml.template     yarn-env.cmd
configuration.xsl           hadoop-user-functions.sh.example  kms-env.sh            mapred-site.xml                yarn-env.sh
container-executor.cfg      hdfs-site.xml                     kms-log4j.properties  shellprofile.d                 yarnservice-log4j.properties
core-site.xml               httpfs-env.sh                     kms-site.xml          ssl-client.xml.example         yarn-site.xml
hadoop-env.cmd              httpfs-log4j.properties           log4j.properties      ssl-server.xml.example
hadoop-env.sh               httpfs-signature.secret           mapred-env.cmd        user_ec_policies.xml.template
hadoop-metrics2.properties  httpfs-site.xml                   mapred-env.sh         workers许多配置文件需要修改以完成hadoop的安装;
【5.1】hadoop-env.sh
编辑 hadoop-env.sh 文件的 JAVA_HOME (54行)
vim hadoop-env.sh
export JAVA_HOME=$(dirname $(dirname $(readlink $(readlink $(which javac))))) 
编辑 core-site.xml 文件
core-site.xml 文件包含hadoop集群启动所需信息,其属性包括:
hadoop实例端口号;
文件系统分配内存大小;
数据存储的内存限制;
读写缓冲区大小;
编辑如下: 在 <configuration>元素内新增文件系统属性:
<configuration><property><name>fs.default.name</name><value>hdfs://localhost:9000</value><description>The default file system URI</description></property></configuration>【5.2】hdfs-site.xml
这个文件是集群中所有主机都需要配置的问题。其包含的内容如下。
namenode和datanode在文件系统中的路径;
副本数据的值
创建namenode 与 datanode 文件夹, 把 hadoop文件所有者修改为 hadoop:hadoop
[hadoop@centos202 hadoop]$ sudo mkdir -p /hadoop/hdfs/{namenode,datanode}
[sudo] password for hadoop: 
[hadoop@centos202 hadoop]$ 
[hadoop@centos202 hadoop]$ sudo chown -R hadoop:hadoop /hadoop编辑hdfs-site.xml ,如下:
<configuration><property><name>dfs.replication</name><value>1</value></property><property><name>dfs.name.dir</name><value>file:///hadoop/hdfs/namenode</value></property><property><name>dfs.data.dir</name><value>file:///hadoop/hdfs/datanode</value></property>
</configuration>【5.3】mapred-site.xml
用于设置 mapreduce框架;
编辑如下:
<configuration><property><name>mapreduce.framework.name</name><value>yarn</value></property>
</configuration>【5.4】yarn-site.xml
yarn-site.xml 定义了资源管理和job调度逻辑。编辑如下;
<configuration><property><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value></property>
</configuration>【6】验证hadoop配置(启动hadoop)
切换到haodop ,
sudo su - hadoop【6.1】格式化 hdfs namenode
格式化的作用是: 删除hdfs的所有文件夹;临时文件夹包含 datanode和namenode,如果格式化namenode,这些文件都会变为空。
小结:namenode维护了与datanode关联的元数据,当我们格式化时,也会格式化这些元数据,以便新数据复用。
you also refer2 https://stackoverflow.com/questions/27143409/what-the-command-hadoop-namenode-format-will-do
【6.2】启动hdfs
[hadoop@centos202 ~]$ start-dfs.sh
Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [centos202]
centos202: Warning: Permanently added 'centos202,192.168.163.202' (ECDSA) to the list of known hosts.
2023-03-05 05:18:42,008 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[hadoop@centos202 ~]$ 【6.3】启动yarn
[hadoop@centos202 ~]$ start-yarn.sh
Starting resourcemanager
Starting nodemanagers【6.4】hadoop web ui界面
1)hadoop3.x 的 默认web ui端口如下:
namenode(hadoop仪表盘):9870
resource manager(hadop集群概览): 8088
MapReduce job history server:19888
我们导出hadoop使用的端口:
[hadoop@centos202 ~]$ ss -tunelp | grep java
tcp   LISTEN 0      128          0.0.0.0:8030       0.0.0.0:*    users:(("java",pid=15007,fd=320)) uid:1000 ino:81561 sk:1 <-> 
tcp   LISTEN 0      128          0.0.0.0:8031       0.0.0.0:*    users:(("java",pid=15007,fd=310)) uid:1000 ino:80766 sk:2 <-> 
tcp   LISTEN 0      128          0.0.0.0:8032       0.0.0.0:*    users:(("java",pid=15007,fd=330)) uid:1000 ino:81957 sk:3 <-> 
tcp   LISTEN 0      128          0.0.0.0:8033       0.0.0.0:*    users:(("java",pid=15007,fd=299)) uid:1000 ino:80016 sk:4 <-> 
tcp   LISTEN 0      128          0.0.0.0:41059      0.0.0.0:*    users:(("java",pid=15158,fd=306)) uid:1000 ino:88126 sk:5 <-> 
tcp   LISTEN 0      128        127.0.0.1:44965      0.0.0.0:*    users:(("java",pid=14546,fd=279)) uid:1000 ino:74525 sk:6 <-> 
tcp   LISTEN 0      128          0.0.0.0:8040       0.0.0.0:*    users:(("java",pid=15158,fd=317)) uid:1000 ino:88014 sk:7 <-> 
tcp   LISTEN 0      128          0.0.0.0:9864       0.0.0.0:*    users:(("java",pid=14546,fd=308)) uid:1000 ino:74543 sk:8 <-> 
tcp   LISTEN 0      128        127.0.0.1:9000       0.0.0.0:*    users:(("java",pid=14418,fd=285)) uid:1000 ino:70479 sk:9 <-> 
tcp   LISTEN 0      128          0.0.0.0:8042       0.0.0.0:*    users:(("java",pid=15158,fd=328)) uid:1000 ino:88858 sk:a <-> 
tcp   LISTEN 0      128          0.0.0.0:9866       0.0.0.0:*    users:(("java",pid=14546,fd=278)) uid:1000 ino:74483 sk:b <-> 
tcp   LISTEN 0      128          0.0.0.0:9867       0.0.0.0:*    users:(("java",pid=14546,fd=309)) uid:1000 ino:74560 sk:c <-> 
tcp   LISTEN 0      128          0.0.0.0:9868       0.0.0.0:*    users:(("java",pid=14770,fd=279)) uid:1000 ino:77941 sk:d <-> 
tcp   LISTEN 0      128          0.0.0.0:9870       0.0.0.0:*    users:(("java",pid=14418,fd=274)) uid:1000 ino:70244 sk:e <-> 
tcp   LISTEN 0      128          0.0.0.0:8088       0.0.0.0:*    users:(("java",pid=15007,fd=289)) uid:1000 ino:78820 sk:10 <->
tcp   LISTEN 0      128          0.0.0.0:13562      0.0.0.0:*    users:(("java",pid=15158,fd=327)) uid:1000 ino:89419 sk:11 <->2)访问 centos202:9870 查看hadoop 数据仪表盘 (虚拟机主机名为centos202,也可以通过ip地址访问)

3)访问 centos202:8088 查看hadoop集群概览

【6.5】创建 hdfs 文件夹
[hadoop@centos202 ~]$ hadoop fs -mkdir /test
[hadoop@centos202 ~]$ 
[hadoop@centos202 ~]$ hadoop fs -ls /
drwxr-xr-x   - hadoop supergroup          0 2023-03-05 05:29 /test【补充】停止 hadoop 服务 , 停止 hdfs,yarn
[hadoop@centos202 ~]$ stop-dfs.sh
Stopping namenodes on [localhost]
Stopping datanodes
Stopping secondary namenodes [centos202]
[hadoop@centos202 ~]$ stop-yarn.sh
Stopping nodemanagers
Stopping resourcemanager
[hadoop@centos202 ~]$ 【7】安装hbase
【7.1】 下载并安装hbase
hbase安装包, refer2 http://apache.mirror.gtcomm.net/hbase/
本文用的版本是 hbase-2.4.15 ; 可以用 wget,也可以用代理下直到本地,然后用rz传输到centos(本文采用这种);
解压
sudo tar -xzvf hbase-2.4.15-bin.tar.gz
[hadoop@centos202 hbase-2.4.15]$ pwd
/usr/local/hbase/hbase-2.4.15
[hadoop@centos202 hbase-2.4.15]$ ls
bin  CHANGES.md  conf  docs  hbase-webapps  LEGAL  lib  LICENSE.txt  NOTICE.txt  README.txt  RELEASENOTES.md更新 $PATH 环境变量
cat <<EOF | sudo tee /etc/profile.d/hadoop_java.sh
export JAVA_HOME=$(dirname $(dirname $(readlink $(readlink $(which javac)))))
export HADOOP_HOME=/usr/local/hadoop/hadoop-3.2.4
export HADOOP_HDFS_HOME=\$HADOOP_HOME
export HADOOP_MAPRED_HOME=\$HADOOP_HOME
export YARN_HOME=\$HADOOP_HOME
export HADOOP_COMMON_HOME=\$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=\$HADOOP_HOME/lib/native
export HBASE_HOME=/usr/local/hbase/hbase-2.4.15
export PATH=\$PATH:\$JAVA_HOME/bin:\$HADOOP_HOME/bin:\$HADOOP_HOME/sbin:\$HBASE_HOME/bin
EOF5.刷新shell环境变量, 并验证 HBASE_HOME
[hadoop@centos202 hbase-2.4.15]$ source /etc/profile.d/hadoop_java.sh
[hadoop@centos202 conf]$ echo $HBASE_HOME
/usr/local/hbase/hbase-2.4.15 6.编辑 hbase-env.sh , 设置 JAVA_HOME
[hadoop@centos202 conf]$ pwd
/usr/local/hbase/hbase-2.4.15/conf
[hadoop@centos202 conf]$ 
[hadoop@centos202 conf]$ 
[hadoop@centos202 conf]$ vim hbase-env.sh修改28行为:
export JAVA_HOME=$(dirname $(dirname $(readlink $(readlink $(which javac)))))
【7.2】配置 hbase (单机安装)
1)就像配置hadoop 一样, 配置hbase; hbase所有的配置文件在 /usr/local/hbase/hbase-2.4.15/conf 目录下;
2)单机模式下: 所有后台线程(HMaster, HRegionServer, zk)运行在单虚拟机上;
【7.2.1】创建hbase根文件夹
[hadoop@centos202 conf]$ sudo mkdir -p /hadoop/hbase/hfile
[hadoop@centos202 conf]$ sudo mkdir -p /hadoop/zookeeper
[hadoop@centos202 conf]$ sudo chown -R hadoop:hadoop /hadoop/【7.2.2】编辑 hbase-site.xml 文件
<configuration><property><name>hbase.rootdir</name><value>file:/hadoop/hbase/hfile</value></property><property><name>hbase.zookeeper.property.dataDir</name><value>/hadoop/zookeeper</value></property>
</configuration>【补充】默认情况下, 除非你配置了 hbase.rootdir ,否则 你的数据仍然存储在 /tmp/ 目录下;
【7.3】启动hbase
启动 hbase
[hadoop@centos202 conf]$ start-hbase.sh 
【补充】hbase安装到集群可以参考 https://computingforgeeks.com/how-to-install-apache-hadoop-hbase-on-centos-7/ 的 option 2.
【7.4】管理HMaster和 HRegionServer (仅参考)
HMaster服务器 控制hbase集群。你可以启动最多9个备用HMaster服务器,共计10个。
HRegionServer 按照 HMaster的指示去管理 StoreFile中的数据。 一般情况,一个HRegionServer 运行在集群的一个节点上。
HMaster 和 HRegionServer 分别用 命令 local-master-backup.sh , local-regionservers.sh 来启动和停止,如下。
local-master-backup.sh start 2 # 启动备用HMaster
local-regionservers.sh start 3 # 启动多个 RegionServers
local-regionservers.sh stop 3 # 停止多个 RegionServers【补充】
每一个HMaster 使用2个端口(160000 16010)。
【8】启动hbase shell脚本
hadoop 与 hbase 应该在 运行hbase shell 之前运行,如下:
start-dfs.sh 
start-yarn.sh  
start-hbase.sh   【补充】 start-all.sh 可以代替start-dfs 和 start-yarn.sh

启动 hbase shell
hbase shell 
关闭 hbase
[hadoop@centos202 conf]$ stop-hbase.sh 
