yarn-site.xml同步更新其他节点的配置信息(configure整个复制粘贴,可对比查看前后)
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!--启用resourcemanager ha-->
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<!--声明两台resourcemanager的地址-->
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>cluster-yarn1</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>bigdata166</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>bigdata167</value>
</property>
<!--指定zookeeper集群的地址-->
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>bigdata166:2181,bigdata167:2181,bigdata168:2181</value>
</property>
<!--启用自动恢复-->
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<!--指定resourcemanager的状态信息存储在zookeeper集群-->
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
</configuration>
启动hdfs(蓝色部分如果已经做过并启动了,不需要重复执行)
在各个JournalNode节点上,输入以下命令启动journalnode服务:
sbin/hadoop-daemon.sh start journalnode
在[nn1]上,对其进行格式化,并启动:
bin/hdfs namenode -format
sbin/hadoop-daemon.sh start namenode
在[nn2]上,同步nn1的元数据信息:
bin/hdfs namenode -bootstrapStandby
启动[nn2]:
sbin/hadoop-daemon.sh start namenode
启动所有datanode
sbin/hadoop-daemons.sh start datanode
将[nn1]切换为Active
bin/hdfs haadmin -transitionToActive nn1
启动yarn
在bigdata166中执行:
sbin/start-yarn.sh
在bigdata167中执行:
sbin/yarn-daemon.sh start resourcemanager
查看服务状态
bin/yarn rmadmin -getServiceState rm1
测试:
kill掉166的rm
然后查看rm2的状态就变为active

有个坑:刚开始忘了启动zk,导致zkfc和resoucemanger闪退,看日志报无法连接zk错误。
