1

I want to fetch the data daily from yahoo/google finance, related to stock's eod prices. These prices should be directly stored in HDFS in file.

I can later make external table on top of it (using HIVE) and use for further analysis.

So, I am not looking for basic map-reduce, since I don't have any input file as such. Are there any connectors available in python, which can write data in Hadoop?

1 Answer 1

1

Start with dumping your data in a local file. Then find a way to upload the file to HDFS.

  • If you are running your job on an "edge node" (i.e. a Linux box that is not part of the cluster but has all the Hadoop clients installed and configured), then you have the good old HDFS command-line interface

hdfs dfs -put data.txt /user/johndoe/some/hdfs/dir/

  • If you are running your job anywhere else, use an HTTP library (or good old curl command line) to connect to the HDFS REST service -- could be either webHDFS or httpFS depending on the way the cluster has been set up -- and upload the file with a PUT request

http://namenode:port/webhdfs/v1/user/johndoe/some/hdfs/dir/data.txt?op=CREATE&overwrite=false

(and the content of "data.txt" as payload, of course)

Sign up to request clarification or add additional context in comments.

2 Comments

BTW: when using a REST service against a HA cluster, you must call each NameNode until you find the active one.
BTW, when unsing a REST service against a secure cluster, you must set up a Kerberos SPNEGO authentification - and optionally store the Hadoop delegation token for the duration of the session.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.