0

For this program I, am getting the output twice the value. The input file is like read a file and extract the senders email address which is next to from. Display the total number of emails with count

name = input("Enter file:")
if len(name) < 1 : name = "mbox-short.txt"
handle = open(name)
words = list()
count = dict()
for line in handle:
    line = line.rstrip()
    if line.startswith ('From'):
        y = line.split()
        print (y)
        words.append(y[1])
        x = y[1]
        print (x)
for w in words:
    count[w] = count.get(w, 0) + 1
print (count)
3
  • 2
    Please provide a minimal reproducible example with sample input. Commented Dec 28, 2020 at 7:36
  • without seeing the data, it may be difficult to know why your output is giving one result while you are expecting something else. Also does line = line.rstrip() result in a list or a string? Commented Dec 28, 2020 at 8:08
  • Good to see that you removed the reference to source file. It was not good to have the actual data floating around. I will edit my response to remove the output as well. Commented Dec 28, 2020 at 9:08

4 Answers 4

2

I reviewed the data in the input file. As BoarGules mentioned, the input file has both values From and From: . Looking at the data, it looks like you want to use data that have only From .

Here's the code that will give you the desired result:

count = dict()
with open ('"mbox-short.txt', 'r') as f:
     for line in f:
         if line.startswith('From '):
             w = line.split()[1]
             count[w] = count.get(w, 0) + 1

print (count)

The output of this will be as per your original post:

removed output as it contains sensitive personal info.

Sign up to request clarification or add additional context in comments.

Comments

2

The intent of your code is clearly to count the "from line" prefixed to every message in mbox format. But stored email messages also contain headers, and most emails contain a From: header, which your code also counts. That is why the counts are double what you expect.

3 Comments

cmon its not fair the input file was missing !! I wont be able to get out of my ask question ban ..... sorry about my rant I am giving you my vote
The input file format was clear from the question. And the file is probably big, and also it would have been quite improper to publish it without disguising it to the point that it could not be trusted to represent the original correctly. I'm already uncomfortable about the presence of genuine email addresses being published like this, even if they are mostly academic addresses that are widely available anyway.
Was just kidding I am really new at Python and here usually I get really informative and good answers to my many questions (maybe too many) and Now I am stuck with the no more question ban
2

Found the solution for my problem. After from should have given space. Instead of if line.startswith('From') it should be if line.startswith('From ')

Comments

1

using as input mbox-short.txt

From pippo
From pippo
From pippo
From pluto
From pluto
From papera
From papera
From pizza

using your code prova.py:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Mon Dec 28 09:23:05 2020

@author: Pietro
"""
name = input("Enter file:")
if len(name) < 1 : name = "mbox-short.txt"
handle = open(name)
words = list()
count = dict()
for line in handle:
    line = line.rstrip()
    if line.startswith ('From'):
        y = line.split()
        print (y)
        words.append(y[1])
        x = y[1]
        print (x)
for w in words:
    count[w] = count.get(w, 0) + 1
print (count)

I get:

Pietro@?  in 65474269-Getting different output in Python programming  $: ./prova.py 
Enter file:
['From', 'pippo']
pippo
['From', 'pippo']
pippo
['From', 'pippo']
pippo
['From', 'pluto']
pluto
['From', 'pluto']
pluto
['From', 'papera']
papera
['From', 'papera']
papera
['From', 'pizza']
pizza
{'pippo': 3, 'pluto': 2, 'papera': 2, 'pizza': 1}

maybe your Python is damaged somehow ?? or mine is ?

more on your task here:

word frequency program in python

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.