0

I have a text file that contains multiple lines in the format given below:

real    0m0.020s
user    0m0.000s
sys 0m0.000s
Round  1  completed. with matrix size of  1200 x 1200 with threads 8

real    0m0.022s
user    0m0.000s
sys 0m0.001s
Round  2  completed. with matrix size of  1200 x 1200 with threads 8

There are about 500 entries of the this sort(above is an example of 2). I can't seem to figure out how to get them into a pandas dataframe that might look something like this:

Matrix Size    Threads    Round    Real    User    Sys
1200 x 1200    8          1        0.0020  0.0000  0.0000
1200 x 1200    8          2        0.0022  0.0000  0.0001

Is there a way using regex or some other way to convert the test output into a dataframe. Additionally I don't know if I interpreted the times correctly either as they are in 0m(I think 0 minutes) and the 0.02 (I think 0.02 seconds)

3
  • 1
    Are there always two newlines between blocks that will each form a row of the dataframe? Commented Apr 25, 2019 at 1:11
  • I bet the time you ask this question and wait for answer is enough for you to create and run a simple for loop solution on that 500 entries :-) Commented Apr 25, 2019 at 1:12
  • Yeah, each block will forma a record and there are two new lines between them Commented Apr 25, 2019 at 1:12

2 Answers 2

3

You can use a regex:

import re
import pandas as pd

regex = re.compile(r'real +(\dm\d\.\d+s)\nuser +(\dm\d\.\d+s)\nsys +(\dm\d\.\d+s)\nRound +(\d+).+of +(\d+ x \d+).+threads (\d+)')

df = pd.DataFrame(regex.findall(data), columns=['real', 'user', 'sys', 'round', 'matrix size', 'threads'])

print(df)

Output:

       real      user       sys round  matrix size threads
0  0m0.020s  0m0.000s  0m0.000s     1  1200 x 1200       8
1  0m0.022s  0m0.000s  0m0.001s     2  1200 x 1200       8
Sign up to request clarification or add additional context in comments.

2 Comments

Is there a way i could convert the 0m0.020s to (0*60)[from the m] + (0.020)[from the s]
@user9996043 How about df['real'].str.replace('s', '').str.split('m').map(lambda t: float(t[0]) * 60 + float(t[1]))?
1

If you want to solve the problem using only pandas you can use str.split():

# data
s = """real    0m0.020s
user    0m0.000s
sys 0m0.000s
Round  1  completed. with matrix size of  1200 x 1200 with threads 8

real    0m0.022s
user    0m0.000s
sys 0m0.001s
Round  2  completed. with matrix size of  1200 x 1200 with threads 8"""

# str.split on two line breaks for rows then split on the text
df = pd.DataFrame(s.split('\n\n'))[0].str.split('   |real | with |user    |sys |matrix size of  |threads |\n')\
                                  .apply(lambda x: [s for s in x if s]).apply(pd.Series)

# split col 3 on round and completed to get number of rounds
df[3] = df[3].str.strip('Round | completed.')

# rename columns
df.columns = ['real', 'user', 'sys', 'round', 'matrix size', 'threads']

out

       real      user       sys round  matrix size threads
0  0m0.020s  0m0.000s  0m0.000s     1  1200 x 1200       8
1  0m0.022s  0m0.000s  0m0.001s     2  1200 x 1200       8

note that it will be slower gmds' example:

1000 loops, best of 3: 4.42 ms per loop vs 1000 loops, best of 3: 1.84 ms per loop

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.