Discussion:
Why sbin/start-dfs.sh do not exit 1 when namenode can not connect
x***@iluvatar.ai.INVALID
2018-11-13 07:06:38 UTC
Permalink
Hi!

I use systemd to control hdfs service, when reboot, i find below issue:

11月 13 14:47:58 test-storage.novalocal start-dfs.sh[1915]: Starting namenodes on [test-storage.novalocal]
11月 13 14:48:03 test-storage.novalocal start-dfs.sh[1915]: test-storage.novalocal: starting namenode, logging to /home/hdfs/hadoop/logs/hadoop-hdfs-namenode-test-storage.novalocal.ou
11月 13 14:48:06 test-storage.novalocal start-dfs.sh[1915]: 192.168.0.17: ssh: connect to host 192.168.0.17 port 22: No route to host
11月 13 14:48:07 test-storage.novalocal start-dfs.sh[1915]: 192.168.0.16: starting datanode, logging to /home/hdfs/hadoop/logs/hadoop-hdfs-datanode-test-storage.novalocal.out
11月 13 14:48:09 test-storage.novalocal start-dfs.sh[1915]: Starting secondary namenodes [0.0.0.0]
11月 13 14:48:14 test-storage.novalocal start-dfs.sh[1915]: 0.0.0.0: starting secondarynamenode, logging to /home/hdfs/hadoop/logs/hadoop-hdfs-secondarynamenode-test-storage.novaloc
11月 13 14:48:16 test-storage.novalocal systemd[1]: Started skydiscovery hdfs service.

The start-dfs.sh return 0 while one datanode still can not ssh.

Does there any param to control start-dfs.sh exit 1 in this case?

Or is there any better way to control hdfs service with systemd?

Thanks
Elek, Marton
2018-11-15 11:07:01 UTC
Permalink
Hi,

start-dfs.sh can start multiple services (including
namenode/datanode/...) on multiple hosts. Systemd usually manages single
services on localhost.

I would create separated systemd units for namenode/datanode/... etc.
and use 'hdfs namenode', 'hdfs datanode' commands.

Marton
Hi!
11月 13 14:47:58 test-storage.novalocal start-dfs.sh[1915]: Starting
namenodes on [test-storage.novalocal]
test-storage.novalocal: starting namenode, logging to
/home/hdfs/hadoop/logs/hadoop-hdfs-namenode-test-storage.novalocal.ou
192.168.0.17: ssh: connect to host 192.168.0.17 port 22: No route to host
192.168.0.16: starting datanode, logging to
/home/hdfs/hadoop/logs/hadoop-hdfs-datanode-test-storage.novalocal.out
11月 13 14:48:09 test-storage.novalocal start-dfs.sh[1915]: Starting
secondary namenodes [0.0.0.0]
starting secondarynamenode, logging to
/home/hdfs/hadoop/logs/hadoop-hdfs-secondarynamenode-test-storage.novaloc
11月 13 14:48:16 test-storage.novalocal systemd[1]: Started skydiscovery
hdfs service.
The start-dfs.sh return 0 while one datanode still can not ssh.
Does there any param to control start-dfs.sh exit 1 in this case?
Or is there any better way to control hdfs service with systemd?
Thanks
---------------------------------------------------------------------
To unsubscribe, e-mail: user-***@hadoop.apache.org
For additional commands, e-mail: user-***@hadoop.apache.org

Loading...