2

I have specific file format from CNC (work center) data. saved like .txt . I want read this table to pandas dataframe but i never seen this format before.

_MASCHINENNUMMER    : >0-251-11-0950/51<     SACHBEARB.: >BSTWIN32<
_PRODUKTSCHLUESSEL  : >BST 500<           DATUM     : >05-20-2016<
---------------------------------------------------------------------------
*BOHRKOPF !SPINDEL!WK!DELTA-X   !DELTA-Y   !DURCHMESSER! KOMMENTAR
----------+----------+----------+----------+-----------+-------------------
[NoValidForUse]
A21       !      1!62!     0.000!     0.000!      0.000!
[V11]
A12       !     -1!62!     0.000!  -160.000!      0.000!
A12       !      2!62!     0.000!  -128.000!      3.000!  70.0
A12       !     -3!62!     0.000!   -96.000!      0.000!
A12       !      4!62!     0.000!   -64.000!      0.000!
---------------------------------------------------------------------------
*BOHRKOPF !SPINDEL!WK!DELTA-X   !DELTA-Y   !DURCHMESSER! KOMMENTAR
----------+----------+----------+----------+-----------+-------------------
[V11]
O11       !     -9!62!     0.000!   -96.000!      0.000!
O11       !     10!62!     0.000!  -128.000!      5.000!  70.0

Questions: 1. Is it possible to read this and convert as pandas Dataframe? 2. Hou to do this ?

  • why pandas dataFrame? I want this data use for some analysis by this characteristics of item. For analysis i always use pandas. Maybe for this i need do different ways ?

Expected outpu:

two pandas DataFrames first:

---------------------------------------------------------------------------------------
*BOHRKOPF !SPINDEL!WK!DELTA-X   !DELTA-Y   !DURCHMESSER! KOMMENTAR ! TYPE
----------+----------+----------+----------+-----------+-------------------------------
A21       !      1!62!     0.000!     0.000!      0.000!           !NoValidForUse
A12       !     -1!62!     0.000!  -160.000!      0.000!           !V11
A12       !      2!62!     0.000!  -128.000!      3.000!  70.0     !V11
A12       !     -3!62!     0.000!   -96.000!      0.000!           !V11
A12       !      4!62!     0.000!   -64.000!      0.000!           !V11

And second:

---------------------------------------------------------------------------------------
*BOHRKOPF !SPINDEL!WK!DELTA-X   !DELTA-Y   !DURCHMESSER! KOMMENTAR ! TYPE
----------+----------+----------+----------+-----------+-------------------------------
O11       !     -9!62!     0.000!   -96.000!      0.000!           !V11
O11       !     10!62!     0.000!  -128.000!      5.000!  70.0     !V11

Headers of Dataframe1 and dataframe2 can be different:

_MASCHINENNUMMER    : >0-251-11-0950/51<     SACHBEARB.: >BSTWIN32<
_PRODUKTSCHLUESSEL  : >BST 500<           DATUM     : >05-20-2016<
---------------------------------------------------------------------------
*BOHRKOPF !SPINDEL!WK!DELTA-X   !DELTA-Y   !DURCHMESSER! KOMMENTAR
----------+----------+----------+----------+-----------+-------------------
[NoValidForUse]
A21       !      1!62!     0.000!     0.000!      0.000!
[V11]
A12       !     -1!62!     0.000!  -160.000!      0.000!
A12       !      2!62!     0.000!  -128.000!      3.000!  70.0
A12       !     -3!62!     0.000!   -96.000!      0.000!
 ---------------------------------------------------------------------------
*BOHRKOPF !          !X-POS     !Y-POS     !           ! 
----------+----------+----------+----------+-----------+-------------------
[V11]
O11       !          !     0.000!   -96.000!           !
O11       !          !     0.000!  -128.000!           !  
  • on file can be different number of dataframes between 5 and 10 but structure of file sesame separator "!" headers row starts whit "*"
2
  • 1
    What is expected output? Commented Mar 3, 2018 at 8:37
  • i add more info to post. :) Commented Mar 3, 2018 at 8:49

1 Answer 1

6

Yes, it is possible, but really data dependent:

  • first read_csv with omit first 3 rows and omit first whitespaces
  • omit trailing whitespaces in columns by strip
  • create column TYPE by extract values between [] and forward fill next rows
  • create helper column for distinguish each DataFrame by startswith and cumsum
  • last remove by contains rows where first column starts with [, -- or *

df = pd.read_csv(file, sep="!", skiprows=3, skipinitialspace=True)
df.columns = df.columns.str.strip()
df['TYPE'] = df['*BOHRKOPF'].str.extract('\[(.*)\]', expand=False).ffill()
df['G'] = df['*BOHRKOPF'].str.startswith('*').cumsum()
df = df[~df['*BOHRKOPF'].str.contains('^\[|^--|^\*')]
print (df)
     *BOHRKOPF SPINDEL  WK DELTA-X   DELTA-Y DURCHMESSER KOMMENTAR  \
2   A21              1  62   0.000     0.000       0.000       NaN   
4   A12             -1  62   0.000  -160.000       0.000       NaN   
5   A12              2  62   0.000  -128.000       3.000      70.0   
6   A12             -3  62   0.000   -96.000       0.000       NaN   
7   A12              4  62   0.000   -64.000       0.000       NaN   
12  O11             -9  62   0.000   -96.000       0.000       NaN   
13  O11             10  62   0.000  -128.000       5.000      70.0   

             TYPE  G  
2   NoValidForUse  0  
4             V11  0  
5             V11  0  
6             V11  0  
7             V11  0  
12            V11  1  
13            V11  1  

and then filter by G column:

df1 = df[df['G'] == 0].drop('G', axis=1)
print (df1)
    *BOHRKOPF SPINDEL  WK DELTA-X   DELTA-Y DURCHMESSER KOMMENTAR  \
2  A21              1  62   0.000     0.000       0.000       NaN   
4  A12             -1  62   0.000  -160.000       0.000       NaN   
5  A12              2  62   0.000  -128.000       3.000      70.0   
6  A12             -3  62   0.000   -96.000       0.000       NaN   
7  A12              4  62   0.000   -64.000       0.000       NaN   

            TYPE  
2  NoValidForUse  
4            V11  
5            V11  
6            V11  
7            V11  

df2 = df[df['G'] == 1].drop('G', axis=1)
print (df2)
     *BOHRKOPF SPINDEL  WK DELTA-X   DELTA-Y DURCHMESSER KOMMENTAR TYPE
12  O11             -9  62   0.000   -96.000       0.000       NaN  V11
13  O11             10  62   0.000  -128.000       5.000      70.0  V11

If in file is multiple DataFrames is possible use list comprehension for list of DataFrames:

dfs = [v.drop('G', axis=1) for k, v in df.groupby('G')]
print (dfs[0])
    *BOHRKOPF SPINDEL  WK DELTA-X   DELTA-Y DURCHMESSER KOMMENTAR  \
2  A21              1  62   0.000     0.000       0.000       NaN   
4  A12             -1  62   0.000  -160.000       0.000       NaN   
5  A12              2  62   0.000  -128.000       3.000      70.0   
6  A12             -3  62   0.000   -96.000       0.000       NaN   
7  A12              4  62   0.000   -64.000       0.000       NaN   

            TYPE  
2  NoValidForUse  
4            V11  
5            V11  
6            V11  
7            V11  

print (dfs[1])
     *BOHRKOPF SPINDEL  WK DELTA-X   DELTA-Y DURCHMESSER KOMMENTAR TYPE
12  O11             -9  62   0.000   -96.000       0.000       NaN  V11
13  O11             10  62   0.000  -128.000       5.000      70.0  V11

EDIT:

temp=u"""_MASCHINENNUMMER    : >0-251-11-0950/51<     SACHBEARB.: >BSTWIN32<
_PRODUKTSCHLUESSEL  : >BST 500<           DATUM     : >05-20-2016<
---------------------------------------------------------------------------
*BOHRKOPF !SPINDEL!WK!DELTA-X   !DELTA-Y   !DURCHMESSER! KOMMENTAR
----------+----------+----------+----------+-----------+-------------------
[NoValidForUse]
A21       !      1!62!     0.000!     0.000!      0.000!
[V11]
A12       !     -1!62!     0.000!  -160.000!      0.000!
A12       !      2!62!     0.000!  -128.000!      3.000!  70.0
A12       !     -3!62!     0.000!   -96.000!      0.000!
A12       !      4!62!     0.000!   -64.000!      0.000!
---------------------------------------------------------------------------
*BOHRKOPF !          !X-POS     !Y-POS     !           ! 
----------+----------+----------+----------+-----------+-------------------
[V11]
O11       !          !     0.000!   -96.000!           !
O11       !          !     0.000!  -128.000!           !  """

Add parameter header for default columns names:

#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), sep="!", skiprows=3, skipinitialspace=True, header=None)
df['TYPE'] = df[0].str.extract('\[(.*)\]', expand=False).ffill()
df['G'] = df[0].str.startswith('*').cumsum()
#dont remove rows start with *
df = df[~df[0].str.contains('^\[|^--')]

print (df)
             0        1           2           3           4            5  \
0   *BOHRKOPF   SPINDEL          WK  DELTA-X     DELTA-Y     DURCHMESSER   
3   A21               1          62       0.000       0.000        0.000   
5   A12              -1          62       0.000    -160.000        0.000   
6   A12               2          62       0.000    -128.000        3.000   
7   A12              -3          62       0.000     -96.000        0.000   
8   A12               4          62       0.000     -64.000        0.000   
10  *BOHRKOPF       NaN  X-POS       Y-POS              NaN          NaN   
13  O11             NaN       0.000     -96.000         NaN          NaN   
14  O11             NaN       0.000    -128.000         NaN          NaN   

            6           TYPE  G  
0   KOMMENTAR            NaN  1  
3         NaN  NoValidForUse  1  
5         NaN            V11  1  
6        70.0            V11  1  
7         NaN            V11  1  
8         NaN            V11  1  
10        NaN            V11  2  
13        NaN            V11  2  
14        NaN            V11  2  

For each loop remove column G, rename all columns without last 2 by first row, remove first row by iloc and last if necessary remove all columns fill NaNs only by dropna:

dfs = [v.drop('G', axis=1).rename(columns=v.iloc[0, :-2]).iloc[1:].dropna(axis=1, how='all') for k, v in df.groupby('G')]
print (dfs[0])
   *BOHRKOPF  SPINDEL  WK DELTA-X    DELTA-Y    DURCHMESSER KOMMENTAR  \
3  A21              1  62      0.000      0.000       0.000       NaN   
5  A12             -1  62      0.000   -160.000       0.000       NaN   
6  A12              2  62      0.000   -128.000       3.000      70.0   
7  A12             -3  62      0.000    -96.000       0.000       NaN   
8  A12              4  62      0.000    -64.000       0.000       NaN   

            TYPE  
3  NoValidForUse  
5            V11  
6            V11  
7            V11  
8            V11 

print (dfs[1])
    *BOHRKOPF  X-POS      Y-POS      TYPE
13  O11             0.000    -96.000  V11
14  O11             0.000   -128.000  V11
Sign up to request clarification or add additional context in comments.

4 Comments

This solution will be good if headers always sesame like in line 4
@ArnoldasBankauskas - I see edit. In each file are only 2 dataframes? Or more?
on file can be different number of dataframes between 5 and 10 but structure of file sesame separator "!" headers row starts whit "*"
brilliant now i understand how deal with this.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.