3

I am trying to create a file/directory in HDFS using python. To be clear, I am running a Hadoop streaming job with mapper written in Python. This mapper is actually trying to create a file in HDFS. I read that there are several Python frameworks to do this, but my interest is to go for Hadoop streaming. So, is there any way in Hadoop streaming to accomplish this?.

4 Answers 4

1

You Can run command HDFS in script python

import sys, subprocess

def run_cmd(args_list):
        proc = subprocess.Popen(args_list, stdout=subprocess.PIPE,stderr=subprocess.PIPE)
        (output, errors) = proc.communicate()
        if proc.returncode:
                raise RuntimeError('Error run_cmd')
        return (output, errors)

And run

(out, errors)=run_cmd(['hdfs','dfs','-mkdir','%s' %apth_HDFS_to_create_folder])
Sign up to request clarification or add additional context in comments.

Comments

0

there is no way to create file with python script, but it's possible to create directory using pydoop or snakebit

see : https://www.geeksforgeeks.org/creating-files-in-hdfs-using-python-snakebite/

4 Comments

yes it is possible to create file using: (ret, out, err)= run_cmd(['hdfs', 'dfs', '-touchz', filename])
Yes, but no. It's possible with pip install hdfs not subprocess - pypi.org/project/hdfs
it's not about that
0

it is possible to create file using:

#define run commande function which run hadoop native linux cmd 
def run_cmd(args_list):
        """
        run linux commands
        """
        # import subprocess
        print('Running system command: {0}'.format(' '.join(args_list)))
        proc = Popen(args_list, stdout=PIPE, stderr=PIPE)
        s_output, s_err = proc.communicate()
        s_return =  proc.returncode
        return s_return, s_output, s_err 

(ret, out, err)= run_cmd(['hdfs', 'dfs', '-touchz', filename])

1 Comment

Please edit your other answer(s) rather than post multiple different ones
0

Solution using supprocess inspired by this answer in the "Create HDFS file" question.

from subprocess import Popen, PIPE 

(ret, out, err) = run_cmd(['hdfs', 'dfs', '-touchz', '/directory/filename'])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.