0

I am running Hadoop sample application given in 'Hadoop in Action'by Chuck Lam on Win 7 notebook on Cygwin environment. Python is installed on Cygwin and sample python application running. When I run hadoop streaming application it is throwing following error. Following is command

"bin/hadoop jar contrib/streaming/hadoop-streaming-1.2.1.jar -D mapred.reduce.tasks=1 -input input/cite75_99.txt -output output -mapper 'RandomSample.py 10' -file RandomSample.py

RandomSample.py is simple application for filtering input.

The following error is thrown:

java.io.IOException: Cannot run program "C:\cygwin64\home\RajS1\hadoop-1.2.1\.\RandomSample.py": CreateProcess error=193, %1 is not a valid Win32 application
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1041)
        at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:214)
        at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
        at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
        at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:724)
Caused by: java.io.IOException: CreateProcess error=193, %1 is not a valid Win32 application
        at java.lang.ProcessImpl.create(Native Method)
        at java.lang.ProcessImpl.<init>(ProcessImpl.java:376)
        at java.lang.ProcessImpl.start(ProcessImpl.java:136)
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1022)

And when I run following command then also it throws similar error. My guess streaming application should execute python application but it is trying to execute this as java application Please suggest solution. Thanks in advance

2
  • "Python is installed on Cygwin" -- perhaps you need the windows python installed directly. Commented Oct 7, 2013 at 16:30
  • When I run Python script with streaming with Unix command then it runs properly. It seems Hadoop- streaming API is not able to run / execute Python script as it is considering it as Java program Commented Oct 9, 2013 at 1:46

1 Answer 1

2

you might want to try this option as well

bin/hadoop jar contrib/streaming/hadoop-streaming-1.2.1.jar -D mapred.reduce.tasks=1 -input input/cite75_99.txt -output output -mapper 'python RandomSample.py 10' -file RandomSample.py

If you try this, you might not get this error. But you might get access denied 'python RandomSample.py' , try to give the full path to the python exe. like

bin/hadoop jar contrib/streaming/hadoop-streaming-1.2.1.jar -D mapred.reduce.tasks=1 -input input/cite75_99.txt -output output -mapper 'c:\mfiles\python RandomSample.py 10' -file RandomSample.py

good luck

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.