1
line1 = " The median income for a household in the city was $64,411, and the median income for a family was $78,940. The per capita income for the city was $22,466. About 4.3% of families and 5.9% of the population were below the poverty line, including 7.0% of those under age 18 and 12.3% of those age 65 or over."

line2 = " The median income for a household in the city was $31,893, and the median income for a family was $38,508. Males had a median income of $30,076 versus $20,275 for females. The per capita income for the city was $16,336. About 14.1% of families and 16.7% of the population were below the poverty line, including 21.8% of those under age 18 and 21.0% of those age 65 or over."

expected output:

household median income: $64,411
family median income: $78,940
per capital income: $22,466



[householdIncome, familyIncome, perCapitalIncome] = re.findall("\d+,\d+",line1)

line1 works well. line2:

ValueError: too many values to unpack (expected 3)

The main obj is how to identify the 1st number/value after locate the key word.

some lines they do not have the per capital income, I can accept it as ""

3 Answers 3

2

As pointed out by others, you'll need some additional programming logic. Consider the following example which uses a regular expression to find the values in question and calculates a median if necessary:

import re, locale
from locale import atoi
locale.setlocale( locale.LC_ALL, 'en_US.UTF-8' )

lines = ["The median income for a household in the city was $64,411, and the median income for a family was $78,940. The per capita income for the city was $22,466. About 4.3% of families and 5.9% of the population were below the poverty line, including 7.0% of those under age 18 and 12.3% of those age 65 or over.",
"The median income for a household in the city was $31,893, and the median income for a family was $38,508. Males had a median income of $30,076 versus $20,275 for females. The per capita income for the city was $16,336. About 14.1% of families and 16.7% of the population were below the poverty line, including 21.8% of those under age 18 and 21.0% of those age 65 or over."]

# define the regex
rx = re.compile(r'''
        (?P<type>household|family|per\ capita)
        \D+
        \$(?P<amount>\d[\d,]*\d)
        (?:
            \s+versus\s+
            \$(?P<amount2>\d[\d,]*\d)
        )?''', re.VERBOSE)

def afterwork(match):
    if match.group('amount2'):
        amount = (atoi(match.group('amount')) + atoi(match.group('amount2'))) / 2
    else:
        amount = atoi(match.group('amount'))
    return amount

result = {}
for index, line in enumerate(lines):
    result['line' + str(index)] = [(m.group('type'), afterwork(m)) for m in rx.finditer(line)]

print(result)
# {'line1': [('household', 31893), ('family', 38508), ('per capita', 16336)], 'line0': [('household', 64411), ('family', 78940), ('per capita', 22466)]}
Sign up to request clarification or add additional context in comments.

1 Comment

Looks great can you provide more explanation?
2

The result of executing re.findall("\d+,\d+",line2) is ['31,893', '38,508', '30,076', '20,275', '16,336']. Thus the immediate problem is that there are five results from the regex and you have allowed for only three. However, there is a slightly deeper problem. When I examined the two sentences I found that they have different structures. In the first, household income, family income and per capita income do indeed seem to appear first but this does not appear to be the case in the second sentence. I would say that you need to provide for some more complicated analysis of the sentence.

1 Comment

U are exactly right, I just modified the question, basically, most lines have the same key words :"household", "family","per capital income", some do not have. I hope to be able to identify the key word and related value.
0

In line2 findall finds more than 3 matches and you are trying to unpack them on only 3 variables.

Use something like this:

[householdIncome, familyIncome, perCapitalIncome] = re.findall("\d+,\d+",line1)[:3]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.