java.io.IOException: All directories in dfs.datanode.data.dir are invalid

Question

I'm trying to get hadoop and hive to run locally on my linux system, but when I run jps, I noticed that the datanode service is missing:

vaughn@vaughn-notebook:/usr/local/hadoop$ jps
2209 NameNode
2682 ResourceManager
3084 Jps
2510 SecondaryNameNode

If I run bin/hadoop datanode, the following error occurs:

    17/07/13 19:40:14 INFO datanode.DataNode: registered UNIX signal handlers for [TERM, HUP, INT]
    17/07/13 19:40:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    17/07/13 19:40:15 WARN datanode.DataNode: Invalid dfs.datanode.data.dir /home/cloudera/hdata/dfs/data : 
    ExitCodeException exitCode=1: chmod: changing permissions of '/home/cloudera/hdata/dfs/data': Operation not permitted

        at org.apache.hadoop.util.Shell.runCommand(Shell.java:559)
        at org.apache.hadoop.util.Shell.run(Shell.java:476)
        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:723)
        at org.apache.hadoop.util.Shell.execCommand(Shell.java:812)
        at org.apache.hadoop.util.Shell.execCommand(Shell.java:795)
        at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:646)
        at org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:479)
        at org.apache.hadoop.util.DiskChecker.mkdirsWithExistsAndPermissionCheck(DiskChecker.java:140)
        at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:156)
        at org.apache.hadoop.hdfs.server.datanode.DataNode$DataNodeDiskChecker.checkDir(DataNode.java:2285)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.checkStorageLocations(DataNode.java:2327)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2309)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2201)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2248)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2424)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2448)
    17/07/13 19:40:15 FATAL datanode.DataNode: Exception in secureMain
    java.io.IOException: All directories in dfs.datanode.data.dir are invalid: "/home/cloudera/hdata/dfs/data/" 
        at org.apache.hadoop.hdfs.server.datanode.DataNode.checkStorageLocations(DataNode.java:2336)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2309)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2201)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2248)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2424)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2448)
    17/07/13 19:40:15 INFO util.ExitUtil: Exiting with status 1
    17/07/13 19:40:15 INFO datanode.DataNode: SHUTDOWN_MSG: 
    /************************************************************

SHUTDOWN_MSG: Shutting down DataNode at vaughn-notebook/127.0.1.1

That directory seems unusual, but I don't think there's anything technically wrong with it. Here are the permissions on the directory:

vaughn@vaughn-notebook:/usr/local/hadoop$ ls -ld /home/cloudera/hdata/dfs/data
drwxrwxrwx 2 root root 4096 Jul 13 19:14 /home/cloudera/hdata/dfs/data

I also removed anything in the tmp folder and formatted the hdfs namenode. Here is my hdfs-site file:

<configuration>

<property>
  <name>dfs.replication</name>
  <value>1</value>
  <description>Default block replication.
  The actual number of replications can be specified when the file is created.
  The default is used if replication is not specified in create time.
  </description>
 </property>
 <property>
   <name>dfs.namenode.name.dir</name>
   <value>file:/home/cloudera/hdata/dfs/name</value>
 </property>
 <property>
   <name>dfs.datanode.data.dir</name>
   <value>file:/home/cloudera/hdata/dfs/data</value>
 </property>

</configuration>

And my core-site file:

<configuration>

<property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/home/cloudera/hdata</value>
</property>

</configuration>

In my googling, I've seen some suggest running "sudo chown hduser:hadoop -R /usr/local/hadoop_store", but when I do that I get the error "chown: invalid user: ‘hduser:hadoop’". Do I have to create this user and group? I'm not really familiar with the process. Thanks in advance for any assistance.

What's the user account you are using to start/stop hadoop services ? — SachinJose
– SachinJose, Commented Jul 14, 2017 at 0:53

vasanth · Accepted Answer · 2017-07-14 04:47:07Z

3

1.sudo chown vaughn:hadoop -R /usr/local/hadoop_store

where hadoop is group name. use

grep vaughn /etc/group

in your terminal to see your group name.

2.clean temporary directories.

3.Format the name node.

Hope this helps.

answered Jul 14, 2017 at 4:47

vasanth

2163 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

lengthy_preamble Over a year ago

Thanks, this did it!

SachinJose · Accepted Answer · 2017-07-14 17:03:45Z

1

Looks like it's a permission issue, The user which is used to start datanode should have write access in the data node -data directories.

Try to execute the below command before starting datanode service.

sudo chmod -R 777 /home/cloudera/hdata/dfs

You can also update owner:group using chown command, that's the best option.

Edit

If data node start up still fails try to update the file ownership using the below command before starting data node.

sudo chown -R vaughn.root /home/cloudera/hdata/dfs

edited Jul 14, 2017 at 17:03

answered Jul 14, 2017 at 0:51

SachinJose

8,5224 gold badges45 silver badges64 bronze badges

1 Comment

lengthy_preamble Over a year ago

Thanks for your replies, sachin. I already did set the permissions on the dfs folder to 777, I think I forgot to mention that. I tried it again just to be sure, and the datanode still does not start. The account that I use to start the services is called 'vaughn', but I'm not sure how I would use chown in this context.

Christian Rapp · Accepted Answer · 2017-12-12 19:41:43Z

0

sudo chown -R /usr/local/hadoop_store

delete datanode namenode directories in hadoop_store

stop-dfs.sh and stop-yarn.sh

hadoop fs namenode -format

start-dfs.sh and start dfs-yarn.sh

Hope it'll help

edited Dec 12, 2017 at 19:41

Christian Rapp

1,9031 gold badge25 silver badges37 bronze badges

answered Dec 12, 2017 at 14:57

Tahir Dibs

31 bronze badge

Comments

Aakash Patel · Accepted Answer · 2019-06-02 05:41:28Z

0

One more possible reason which was in my case: Location of HDFS directory in folder properties shower user name twice i.e. home/hadoop/hadoop/hdfs So, I had added the same directory in hdfs-site.xml. As a solution, I removed hadoop/ and changed it to home/hadoop/hdfs and this resolved my problem.

answered Jun 2, 2019 at 5:41

Aakash Patel

5675 silver badges20 bronze badges

Collectives™ on Stack Overflow

java.io.IOException: All directories in dfs.datanode.data.dir are invalid

4 Answers 4

1 Comment

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related