Variable scope: failed to print inside a for loop

Question

sys.stdin = io.StringIO("workmen,hdfs://localhost:54310/hadoop_test/text_files/file1.txt    1\n workmen,hdfs://localhost:54310/hadoop_test/text_files/lab_exercise6_file1.txt   1\n workmen,hdfs://localhost:54310/hadoop_test/text_files/lab_exercise6_file2.txt   1\n workmen,hdfs://localhost:54310/hadoop_test/text_files/lab_exercise6_file4.txt   1\n workmen,hdfs://localhost:54310/hadoop_test/text_files/lab_exercise6_file5.txt   1\n workno,hdfs://localhost:54310/hadoop_test/text_files/lab_exercise6_file3.txt    1\n works,hdfs://localhost:54310/hadoop_test/text_files/file1.txt   33\n works,hdfs://localhost:54310/hadoop_test/text_files/lab_exercise6_file1.txt    33\n works,hdfs://localhost:54310/hadoop_test/text_files/lab_exercise6_file2.txt    34")

for each_line in sys.stdin:

    each_line = each_line.strip()
    value,total_num_words = each_line.split('\t',1)

    print(value) #not returning anything the code just runs without error.

I have a string of text which i defined in sys.stdin. I would like to read each line and extract the word (e.g. workmen) the filename (e.g.- hdfs://localhost:54310/hadoop_test/text_files/file1.txt) and the count (e.g. 1 for the first case) however when i want to debug and print the value it does not return anything on jupyter. I guess its a variable scope issue or the loop is not running to return the output for 'value'. Is there any workaround to this?

I am working in hadoop, and the java api only takes sys.stdin as input. — CD_NS
– CD_NS, Commented Jun 21, 2020 at 11:14

Jan · Accepted Answer · 2020-06-21 11:23:09Z

You probably do not have tabulators. The following works flawlessly:

from io import StringIO

string = StringIO("""workmen,hdfs://localhost:54310/hadoop_test/text_files/file1.txt    1
 workmen,hdfs://localhost:54310/hadoop_test/text_files/lab_exercise6_file1.txt   1
  workmen,hdfs://localhost:54310/hadoop_test/text_files/lab_exercise6_file2.txt   1
   workmen,hdfs://localhost:54310/hadoop_test/text_files/lab_exercise6_file4.txt   1
    workmen,hdfs://localhost:54310/hadoop_test/text_files/lab_exercise6_file5.txt   1
     workno,hdfs://localhost:54310/hadoop_test/text_files/lab_exercise6_file3.txt    1
      works,hdfs://localhost:54310/hadoop_test/text_files/file1.txt   33
       works,hdfs://localhost:54310/hadoop_test/text_files/lab_exercise6_file1.txt    33
        works,hdfs://localhost:54310/hadoop_test/text_files/lab_exercise6_file2.txt    34""")

for each_line in string:
    each_line = each_line.strip()
    value, total_num_words = each_line.split()

    print(value)  # not returning anything the code just runs without error.

This yields

workmen,hdfs://localhost:54310/hadoop_test/text_files/file1.txt
workmen,hdfs://localhost:54310/hadoop_test/text_files/lab_exercise6_file1.txt
workmen,hdfs://localhost:54310/hadoop_test/text_files/lab_exercise6_file2.txt
workmen,hdfs://localhost:54310/hadoop_test/text_files/lab_exercise6_file4.txt
workmen,hdfs://localhost:54310/hadoop_test/text_files/lab_exercise6_file5.txt
workno,hdfs://localhost:54310/hadoop_test/text_files/lab_exercise6_file3.txt
works,hdfs://localhost:54310/hadoop_test/text_files/file1.txt
works,hdfs://localhost:54310/hadoop_test/text_files/lab_exercise6_file1.txt
works,hdfs://localhost:54310/hadoop_test/text_files/lab_exercise6_file2.txt

Actually my intended yield would be the word (e.g. workmen) and the total_num_words (e.g. 1) so from this tuple of word, filename and total_num_words , how can i extract the word and the count of that occurring(which i thought is tab-separated.)

Collectives™ on Stack Overflow

Variable scope: failed to print inside a for loop

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related