Calculating the average in python

Question

Am Writing a program that prompts for a file name, then opens that file and reads through the file, looking for lines of the form:

X-DSPAM-Confidence: 0.8475

I want to count these lines and extract the floating point values from each of the lines and compute the average of those values. Can I please get some help. I just started programming so I need something very simple. This is the code I have already written.

fname = raw_input("Enter file name: ")
    if len(fname) == 0:
        fname = 'mbox-short.txt'
    fh = open(fname,'r')
    count = 0
    total = 0
    #Average = total/num of lines
    for line in fh:
        if not line.startswith("X-DSPAM-Confidence:"): continue
        count = count+1
        print line

James Hopkin · Accepted Answer · 2016-02-19 15:25:43Z

1

Try:

total += float(line.split(' ')[1])

so that total / count gives you the answer.

answered Feb 19, 2016 at 15:25

James Hopkin

14k1 gold badge46 silver badges73 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Maybe Over a year ago

What does this actually do please

James Hopkin Over a year ago

This splits a string wherever there is a space, so 'X-DSPAM-Confidence: 0.8475' becomes ['X-DSPAM-Confidence:', '0.8475']. The [1] gets the second item, the number as a string, and float converts the string to an actual number that can be added to the total.

James Hopkin Over a year ago

In case it's not clear, you wrote all the code correctly, you just need the line I wrote in place of where you wrote print line.

L3viathan · Accepted Answer · 2016-02-19 15:29:44Z

1

Iterate over the file (using the context manager ("with") handles the closing automatically), looking for such lines (like you did), and then read them in like this:

fname = raw_input("Enter file name:")
if not fname:
    fname = "mbox-short.txt"
scores = []
with open(fname) as f:
    for line in f:
        if not line.startswith("X-DSPAM-Confidence:"):
            continue
        _, score = line.split()
        scores.append(float(score))
print sum(scores)/len(scores)

Or a bit more compact:

mean = lambda x: sum(x)/len(x)
with open(fname) as f:
    result = mean([float(l.split()[1]) if line.startswith("X-DSPAM-Confidence:") for l in f])

edited Feb 19, 2016 at 15:29

answered Feb 19, 2016 at 15:24

L3viathan

27.5k2 gold badges63 silver badges84 bronze badges

1 Comment

Maybe Over a year ago

The code you have written for me is far above my level. Can you please use the split or find function to extract the values, put them in a variable and and then use the variable to calculate the total. Or at the part where it iterates through fh since it skips lines that don't start with this format (X-DSPAM-Confidence: 0.8475), can you put the lines it does not skip into a variable, which will make it much easier to work with. @James Hopkin

Noctis Skytower · Accepted Answer · 2016-02-19 17:43:48Z

A program like the following should satisfy your needs. If you need to change what the program is looking for, just change the PATTERN variable to describe what you are trying to match. The code is written for Python 3.x but can be adapted for Python 2.x without much difficulty if needed.

Program:

#! /usr/bin/env python3
import re
import statistics
import sys


PATTERN = r'X-DSPAM-Confidence:\s*(?P<float>[+-]?\d*\.\d+)'


def main(argv):
    """Calculate the average X-DSPAM-Confidence from a file."""
    filename = argv[1] if len(argv) > 1 else input('Filename: ')
    if filename in {'', 'default'}:
        filename = 'mbox-short.txt'
    print('Average:', statistics.mean(get_numbers(filename)))
    return 0


def get_numbers(filename):
    """Extract all X-DSPAM-Confidence values from the named file."""
    with open(filename) as file:
        for line in file:
            for match in re.finditer(PATTERN, line, re.IGNORECASE):
                yield float(match.groupdict()['float'])


if __name__ == '__main__':
    sys.exit(main(sys.argv))

You may also implement the get_numbers generator in the following way if desired.

Alternative:

def get_numbers(filename):
    """Extract all X-DSPAM-Confidence values from the named file."""
    with open(filename) as file:
        yield from (float(match.groupdict()['float'])
                    for line in file
                    for match in re.finditer(PATTERN, line, re.IGNORECASE))

Collectives™ on Stack Overflow

Calculating the average in python

3 Answers 3

3 Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related