1
sys.stdin = io.StringIO("workmen,hdfs://localhost:54310/hadoop_test/text_files/file1.txt    1\n workmen,hdfs://localhost:54310/hadoop_test/text_files/lab_exercise6_file1.txt   1\n workmen,hdfs://localhost:54310/hadoop_test/text_files/lab_exercise6_file2.txt   1\n workmen,hdfs://localhost:54310/hadoop_test/text_files/lab_exercise6_file4.txt   1\n workmen,hdfs://localhost:54310/hadoop_test/text_files/lab_exercise6_file5.txt   1\n workno,hdfs://localhost:54310/hadoop_test/text_files/lab_exercise6_file3.txt    1\n works,hdfs://localhost:54310/hadoop_test/text_files/file1.txt   33\n works,hdfs://localhost:54310/hadoop_test/text_files/lab_exercise6_file1.txt    33\n works,hdfs://localhost:54310/hadoop_test/text_files/lab_exercise6_file2.txt    34")

for each_line in sys.stdin:

    each_line = each_line.strip()
    value,total_num_words = each_line.split('\t',1)

    print(value) #not returning anything the code just runs without error. 

I have a string of text which i defined in sys.stdin. I would like to read each line and extract the word (e.g. workmen) the filename (e.g.- hdfs://localhost:54310/hadoop_test/text_files/file1.txt) and the count (e.g. 1 for the first case) however when i want to debug and print the value it does not return anything on jupyter. I guess its a variable scope issue or the loop is not running to return the output for 'value'. Is there any workaround to this?

3
  • 1
    Why it's assigned to sys.stdin any specific reason ? Commented Jun 21, 2020 at 11:11
  • I am working in hadoop, and the java api only takes sys.stdin as input. Commented Jun 21, 2020 at 11:14
  • I can't reproduce it, it prints as expected with your code. Commented Jun 21, 2020 at 11:25

1 Answer 1

1

You probably do not have tabulators. The following works flawlessly:

from io import StringIO

string = StringIO("""workmen,hdfs://localhost:54310/hadoop_test/text_files/file1.txt    1
 workmen,hdfs://localhost:54310/hadoop_test/text_files/lab_exercise6_file1.txt   1
  workmen,hdfs://localhost:54310/hadoop_test/text_files/lab_exercise6_file2.txt   1
   workmen,hdfs://localhost:54310/hadoop_test/text_files/lab_exercise6_file4.txt   1
    workmen,hdfs://localhost:54310/hadoop_test/text_files/lab_exercise6_file5.txt   1
     workno,hdfs://localhost:54310/hadoop_test/text_files/lab_exercise6_file3.txt    1
      works,hdfs://localhost:54310/hadoop_test/text_files/file1.txt   33
       works,hdfs://localhost:54310/hadoop_test/text_files/lab_exercise6_file1.txt    33
        works,hdfs://localhost:54310/hadoop_test/text_files/lab_exercise6_file2.txt    34""")

for each_line in string:
    each_line = each_line.strip()
    value, total_num_words = each_line.split()

    print(value)  # not returning anything the code just runs without error.

This yields

workmen,hdfs://localhost:54310/hadoop_test/text_files/file1.txt
workmen,hdfs://localhost:54310/hadoop_test/text_files/lab_exercise6_file1.txt
workmen,hdfs://localhost:54310/hadoop_test/text_files/lab_exercise6_file2.txt
workmen,hdfs://localhost:54310/hadoop_test/text_files/lab_exercise6_file4.txt
workmen,hdfs://localhost:54310/hadoop_test/text_files/lab_exercise6_file5.txt
workno,hdfs://localhost:54310/hadoop_test/text_files/lab_exercise6_file3.txt
works,hdfs://localhost:54310/hadoop_test/text_files/file1.txt
works,hdfs://localhost:54310/hadoop_test/text_files/lab_exercise6_file1.txt
works,hdfs://localhost:54310/hadoop_test/text_files/lab_exercise6_file2.txt
Sign up to request clarification or add additional context in comments.

1 Comment

Actually my intended yield would be the word (e.g. workmen) and the total_num_words (e.g. 1) so from this tuple of word, filename and total_num_words , how can i extract the word and the count of that occurring(which i thought is tab-separated.)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.