0

I need to extract data from lines of a text file. The data is name and scoring information formatted like this:

Feature_Locations:
   - { x:9.0745818614959717e-01, y:2.8846755623817444e-01,
       z:3.5268107056617737e-01 }
   - { x:1.1413983106613159e+00, y:2.7305576205253601e-01,
       z:4.4357028603553772e-01 }
   - { x:1.7582545280456543e+00, y:2.2776308655738831e-01,
       z:6.6982054710388184e-01 }
   - { x:9.6545284986495972e-01, y:2.8368893265724182e-01,
       z:3.6416915059089661e-01 }
   - { x:1.2183872461318970e+00, y:2.7094465494155884e-01,
       z:4.5954680442810059e-01 }

This file is generated by another software. Basically I want to get that data back in this program and i want to save them in different other files for examples "axeX.txt" "axeY.txt" "axeZ.txt"

I have try this

import numpy as np
import matplotlib.pyplot as plt
import re
file = open('data.txt', "r")
for r in file:
    y = re.sub("- {", "",r).split()
    tt = y[:2]
    zz = tt
    st = re.findall('\d+', r)
    print st
file.close()

Is there a better way or I am doing it wrong ?

2 Answers 2

1

The input file is in YAML format. It is recommended to use PyYAML package for parsing yaml files.

import yaml

document = """
Feature_Locations:
   - { x: 9.0745818614959717e-01, y: 2.8846755623817444e-01,
       z: 3.5268107056617737e-01 }
   - { x: 1.1413983106613159e+00, y: 2.7305576205253601e-01,
       z: 4.4357028603553772e-01 }
   - { x: 1.7582545280456543e+00, y: 2.2776308655738831e-01,
       z: 6.6982054710388184e-01 }
   - { x: 9.6545284986495972e-01, y: 2.8368893265724182e-01,
       z: 3.6416915059089661e-01 }
   - { x: 1.2183872461318970e+00, y: 2.7094465494155884e-01,
       z: 4.5954680442810059e-01 }
"""

locations = yaml.load(document)['Feature_Locations']

for ch in 'XYZ':
    fname = 'axe%s.txt' %ch
    with open(fname, 'w') as fh:
        for item in locations:
            fh.write('%s\n' % item[ch.lower()])

The input file is slightly corrupted. yamllint will do a sanity check and inform us of the errors.

yamllint inputfile.yaml
inputfile.yaml
  1:1       warning  missing document start "---"  (document-start)
  2:9       error    syntax error: found unexpected ':'

In this case we can fix the input file easily.

 sed -i 's/:/: /g' inputfile.yaml
Sign up to request clarification or add additional context in comments.

1 Comment

It seems you had to preprocess the document by adding spaces between the variables (x, y, z) and the actual values. Is there any straightforward way of doing that using PyYAML?
0

You can try something like:

s = open('data.txt', "r").read()

x = re.findall(r'x:(.*), ', s)
y = re.findall(r'y:(.*),', s)
z = re.findall(r'z:(.*) ', s)

with open('axeX.txt', 'w') as f: f.write('\n'.join(x))
with open('axeY.txt', 'w') as f: f.write('\n'.join(y))
with open('axeZ.txt', 'w') as f: f.write('\n'.join(z))

1 Comment

Work fine ! Thank you so much ! :D

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.