3

I am trying to import a weirdly formatted text file into a pandas DataFrame. Two example lines are below:

LOADED LANE       1   MAT. TYPE=    2    LEFFECT=    1    SPAN=  200.    SPACE=   10.    BETA=   3.474 LOADEFFECT 5075.    LMAX= 3643.    COV=  .13
LOADED LANE       1   MAT. TYPE=    3    LEFFECT=    1    SPAN=  200.    SPACE=   10.    BETA=   3.515 LOADEFFECT10009.    LMAX= 9732.    COV=  .08

First I tried the following:

df = pd.read_csv('beta.txt', header=None, delim_whitespace=True, usecols=[2,5,7,9,11,13,15,17,19])

This seemed to work fine, however got messed up when it hit the above example line, where there is no whitespace after the LOADEFFECT string (you may need to scroll a bit right to see it in the example). I got a result like:

632   1   2   1  200  10  3.474  5075.  3643.  0.13
633   1   3   1  200  10  3.515  LMAX=   COV=   NaN

Then I decided to use a regular expression to define my delimiters. After many trial and error runs (I am no expert in regex), I managed to get close with the following line:

df = pd.read_csv('beta.txt', header=None, sep='/s +|LOADED LANE|MAT. TYPE=|LEFFECT=|SPAN=|SPACE=|BETA=|LOADEFFECT|LMAX=|COV=', engine='python')

This almost works, but creates a NaN column for some reason at the very beginning:

632 NaN  1  2  1  200  10  3.474   5075  3643  0.13
633 NaN  1  3  1  200  10  3.515  10009  9732  0.08

At this point I think I can just delete that first column, and get away with it. However I wonder what would be the correct way to set up the regex to correctly parse this text file in one shot. Any ideas? Other than that, I am sure there is a smarter way to parse this text file. I would be glad to hear your recommendations.

Thanks!

6
  • 1
    That looks more like a fixed-width format to me. If you draw vertical lines in your file, does the data stay in the right columns? Commented Apr 22, 2015 at 16:49
  • read_fwf like DSM has mentioned can read this if the format is fixed-width, it accepts a width param where you can pass a list of each column width Commented Apr 22, 2015 at 16:50
  • @DSM yes, I think it is a fixed width format file. Commented Apr 22, 2015 at 16:57
  • @EdChum I didn't know about read_fwf I think that will fix my problem! Commented Apr 22, 2015 at 16:59
  • Also not sure but your initial issue may be related to line endings? "\r\n" vs "\n" Commented Apr 22, 2015 at 17:09

1 Answer 1

1
import re
import pandas as pd
import csv
csvfile = open("parsing.txt") #open text file
reader = csv.reader(csvfile)
new_list=[]
for line in reader:
    for i in line:
        new_list.append(re.findall(r'(\d*\.\d+|\d+)', i))

table = pd.DataFrame(new_list)
table # output will be pandas DataFrame with values
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.