Python extract values from text using keys

Question

I have a text file in the following format of Key Value

--START--
FirstName Kitty
LastName McCat
Color Red
random_data
Meow Meow
--END--

I'm wanting to extract specific values from the text into a variable or a dict. For example if I want to extract the values of LastName and Color what would be the best way to do this?

The random_data may be anywhere in the file and span multiple lines.

I've considered using regex but am concerned with performance and readability as in the real code I have many different keys to extract.

I could also loop over each line and check for each key but it's quite messy when having 10+ keys. For example:

if line.startswith("LastName"):
    #split line at space and handle
if line.startswith("Color"):
    #split line at space and handle

Hoping for something a little cleaner

LastName and Color could be grabbed with a regular expression. The random data would be near impossible to extract without specific markers in it — OneCricketeer
– OneCricketeer, Commented Jan 27, 2016 at 23:10
Sorry wasn't very clear, I'm wanting to ignore the random_data and pass over it. I thought about regex but a little concerned about performance and readability. Idealy I would like to be able to define a list of tokens to extact tokens = ['LastName', 'Color'] — James Harding
– James Harding, Commented Jan 27, 2016 at 23:12
Well, the first question that usually gets asked is what have you tried? — OneCricketeer
– OneCricketeer, Commented Jan 27, 2016 at 23:14
Updated original post. I'm wondering if there's something cleaner than what I posted — James Harding
– James Harding, Commented Jan 27, 2016 at 23:19
Can there be multiple instances of FirstName or any other field? — Garrett R
– Garrett R, Commented Jan 27, 2016 at 23:26

Lea · Accepted Answer · 2016-01-27 23:23:07Z

1

tokens = ['LastName', 'Color']  
dictResult = {} 
with open(fileName,'r') as fileHandle: 
   for line in fileHandle:
      lineParts = line.split(" ")
      if len(lineParts) == 2 and lineParts[0] in tokens:
           dictResult[lineParts[0]] = lineParts[1]

edited Jan 27, 2016 at 23:23

answered Jan 27, 2016 at 23:19

Lea

1207 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Garrett R · Accepted Answer · 2016-01-27 23:20:49Z

0

Assuming your file is in something called sampletxt.txt, this would work. It creates a dictionary mapping from key -> list of values.

import re  
with open('sampletxt.txt', 'r') as f:
    txt = f.read()
keys = ['FirstName', 'LastName', 'Color']
d = {}
for key in keys:
    d[key] = re.findall(key+r'\s(.*)\s*\n*', txt)

answered Jan 27, 2016 at 23:20

Garrett R

2,66213 silver badges15 bronze badges

1 Comment

OneCricketeer Over a year ago

Might want to do re.findall()[0], otherwise the value is a list of a single value

OneCricketeer · Accepted Answer · 2016-01-27 23:25:43Z

0

This version allows you to optionally specify the tokens

import re

s = """--START--
FirstName Kitty
LastName McCat
Color Red
random_data
Meow Meow
--END--"""

tokens = ["LastName", "Color"]
if len(tokens) == 0:
    print(re.findall("({0}) ({0})".format("\w+"), s))
else:
    print( list((t, re.findall("{} (\w+)".format(t), s)[0]) for t in tokens))

Output

[('LastName', 'McCat'), ('Color', 'Red')]

answered Jan 27, 2016 at 23:25

OneCricketeer

193k20 gold badges146 silver badges276 bronze badges

Comments

Cooper Gillan · Accepted Answer · 2016-01-27 23:29:15Z

Building off the other answers, this function would use regular expressions to take any text key and return the value if found:

import re
file_name = 'test.txt'

def get_text_value(text_key, file_name):
    match_str = text_key + "\s(\w+)\n"

    with open(file_name, "r") as f:
        text_to_check = f.readlines()

    text_value = None
    for line in text_to_check:

        matched = re.match(match_str, line)
        if matched:
            text_value = matched.group(1)

    return text_value

if __name__ == "__main__":

    first_key = "FirstName"
    first_value = get_text_value(first_key, file_name)
    print('Check for first key "{}" and value "{}"'.format(first_key,
                                                           first_value))

    second_key = "Color"
    second_value = get_text_value(second_key, file_name)
    print('Check for first key "{}" and value "{}"'.format(second_key,
                                                           second_value))

Collectives™ on Stack Overflow

Python extract values from text using keys

4 Answers 4

Comments

1 Comment

Output

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

1 Comment

Output

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related