1

I have a text file in the following format of Key Value

--START--
FirstName Kitty
LastName McCat
Color Red
random_data
Meow Meow
--END--

I'm wanting to extract specific values from the text into a variable or a dict. For example if I want to extract the values of LastName and Color what would be the best way to do this?

The random_data may be anywhere in the file and span multiple lines.

I've considered using regex but am concerned with performance and readability as in the real code I have many different keys to extract.

I could also loop over each line and check for each key but it's quite messy when having 10+ keys. For example:

if line.startswith("LastName"):
    #split line at space and handle
if line.startswith("Color"):
    #split line at space and handle

Hoping for something a little cleaner

5
  • LastName and Color could be grabbed with a regular expression. The random data would be near impossible to extract without specific markers in it Commented Jan 27, 2016 at 23:10
  • Sorry wasn't very clear, I'm wanting to ignore the random_data and pass over it. I thought about regex but a little concerned about performance and readability. Idealy I would like to be able to define a list of tokens to extact tokens = ['LastName', 'Color'] Commented Jan 27, 2016 at 23:12
  • Well, the first question that usually gets asked is what have you tried? Commented Jan 27, 2016 at 23:14
  • Updated original post. I'm wondering if there's something cleaner than what I posted Commented Jan 27, 2016 at 23:19
  • Can there be multiple instances of FirstName or any other field? Commented Jan 27, 2016 at 23:26

4 Answers 4

1
tokens = ['LastName', 'Color']  
dictResult = {} 
with open(fileName,'r') as fileHandle: 
   for line in fileHandle:
      lineParts = line.split(" ")
      if len(lineParts) == 2 and lineParts[0] in tokens:
           dictResult[lineParts[0]] = lineParts[1]
Sign up to request clarification or add additional context in comments.

Comments

0

Assuming your file is in something called sampletxt.txt, this would work. It creates a dictionary mapping from key -> list of values.

import re  
with open('sampletxt.txt', 'r') as f:
    txt = f.read()
keys = ['FirstName', 'LastName', 'Color']
d = {}
for key in keys:
    d[key] = re.findall(key+r'\s(.*)\s*\n*', txt)

1 Comment

Might want to do re.findall()[0], otherwise the value is a list of a single value
0

This version allows you to optionally specify the tokens

import re
​
s = """--START--
FirstName Kitty
LastName McCat
Color Red
random_data
Meow Meow
--END--"""

tokens = ["LastName", "Color"]
if len(tokens) == 0:
    print(re.findall("({0}) ({0})".format("\w+"), s))
else:
    print( list((t, re.findall("{} (\w+)".format(t), s)[0]) for t in tokens))

Output

[('LastName', 'McCat'), ('Color', 'Red')]

Comments

0

Building off the other answers, this function would use regular expressions to take any text key and return the value if found:

import re
file_name = 'test.txt'

def get_text_value(text_key, file_name):
    match_str = text_key + "\s(\w+)\n"

    with open(file_name, "r") as f:
        text_to_check = f.readlines()

    text_value = None
    for line in text_to_check:

        matched = re.match(match_str, line)
        if matched:
            text_value = matched.group(1)

    return text_value

if __name__ == "__main__":

    first_key = "FirstName"
    first_value = get_text_value(first_key, file_name)
    print('Check for first key "{}" and value "{}"'.format(first_key,
                                                           first_value))

    second_key = "Color"
    second_value = get_text_value(second_key, file_name)
    print('Check for first key "{}" and value "{}"'.format(second_key,
                                                           second_value))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.