1

Am Writing a program that prompts for a file name, then opens that file and reads through the file, looking for lines of the form:

X-DSPAM-Confidence: 0.8475

I want to count these lines and extract the floating point values from each of the lines and compute the average of those values. Can I please get some help. I just started programming so I need something very simple. This is the code I have already written.

fname = raw_input("Enter file name: ")
    if len(fname) == 0:
        fname = 'mbox-short.txt'
    fh = open(fname,'r')
    count = 0
    total = 0
    #Average = total/num of lines
    for line in fh:
        if not line.startswith("X-DSPAM-Confidence:"): continue
        count = count+1
        print line

3 Answers 3

1

Try:

total += float(line.split(' ')[1])

so that total / count gives you the answer.

Sign up to request clarification or add additional context in comments.

3 Comments

What does this actually do please
This splits a string wherever there is a space, so 'X-DSPAM-Confidence: 0.8475' becomes ['X-DSPAM-Confidence:', '0.8475']. The [1] gets the second item, the number as a string, and float converts the string to an actual number that can be added to the total.
In case it's not clear, you wrote all the code correctly, you just need the line I wrote in place of where you wrote print line.
1

Iterate over the file (using the context manager ("with") handles the closing automatically), looking for such lines (like you did), and then read them in like this:

fname = raw_input("Enter file name:")
if not fname:
    fname = "mbox-short.txt"
scores = []
with open(fname) as f:
    for line in f:
        if not line.startswith("X-DSPAM-Confidence:"):
            continue
        _, score = line.split()
        scores.append(float(score))
print sum(scores)/len(scores)

Or a bit more compact:

mean = lambda x: sum(x)/len(x)
with open(fname) as f:
    result = mean([float(l.split()[1]) if line.startswith("X-DSPAM-Confidence:") for l in f])

1 Comment

The code you have written for me is far above my level. Can you please use the split or find function to extract the values, put them in a variable and and then use the variable to calculate the total. Or at the part where it iterates through fh since it skips lines that don't start with this format (X-DSPAM-Confidence: 0.8475), can you put the lines it does not skip into a variable, which will make it much easier to work with. @James Hopkin
0

A program like the following should satisfy your needs. If you need to change what the program is looking for, just change the PATTERN variable to describe what you are trying to match. The code is written for Python 3.x but can be adapted for Python 2.x without much difficulty if needed.

Program:

#! /usr/bin/env python3
import re
import statistics
import sys


PATTERN = r'X-DSPAM-Confidence:\s*(?P<float>[+-]?\d*\.\d+)'


def main(argv):
    """Calculate the average X-DSPAM-Confidence from a file."""
    filename = argv[1] if len(argv) > 1 else input('Filename: ')
    if filename in {'', 'default'}:
        filename = 'mbox-short.txt'
    print('Average:', statistics.mean(get_numbers(filename)))
    return 0


def get_numbers(filename):
    """Extract all X-DSPAM-Confidence values from the named file."""
    with open(filename) as file:
        for line in file:
            for match in re.finditer(PATTERN, line, re.IGNORECASE):
                yield float(match.groupdict()['float'])


if __name__ == '__main__':
    sys.exit(main(sys.argv))

You may also implement the get_numbers generator in the following way if desired.

Alternative:

def get_numbers(filename):
    """Extract all X-DSPAM-Confidence values from the named file."""
    with open(filename) as file:
        yield from (float(match.groupdict()['float'])
                    for line in file
                    for match in re.finditer(PATTERN, line, re.IGNORECASE))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.