remove string from text file keep float

Question

I'm looking to remove lines with strings or empty lines in a text file. It looks like this. As you can see the header repeat it self throught the file. The numbers of lines with data vary from each block. I need it to import as an array in numpy. At first I had comma for decimal point at least I was able to change that.

I tried this but it doesn't work at all:

from types import StringType

z = open('D:\Desktop\cycle 1-20 20-50 kPa (dot).dat', 'r')
for line in z.readlines():
    for x in z:
        if type(z.readline(x)) is StringType:
            print line


z.close()

Example of data:

bla bla

cyclical stuff                      Time:   81.095947   Sec 2012-08-02 17:05:42
stored :    1   cycle           stores for :    62  seg-cycle
Points :    4223
Servo_Hyd count Temps   Servo_Air pressure  Servo_Hyd load Servo_Hyd LVDT1  Servo_Hyd LVDT2 Servo_Hyd LVDT3
name1    name1    name1 name1   name1   name1   name1
1   60.102783   0.020013755 89.109558   0.3552089   0.4015148   -0.33822596
1   60.107666   0.020006953 89.025749   0.35519764  0.4015218   -0.33821729
1   60.112549   0.02000189  88.886292   0.3551946   0.4015184   -0.33822691
1   60.117432   0.020007374 89.559196   0.35519707  0.40151948  -0.33823174
1   60.122314   0.019991774 89.741402   0.35519552  0.40151322  -0.33822927
1   60.127197   0.020003742 89.748924   0.35520011  0.40150556  -0.33822462

bla bla

cyclical stuff                      Time:   81.095947   Sec 2012-08-02 17:05:42
stored :    1   cycle           stores for :    62  seg-cycle
Points :    4223
Servo_Hyd count Temps   Servo_Air pressure  Servo_Hyd load Servo_Hyd LVDT1  Servo_Hyd LVDT2 Servo_Hyd LVDT3
name1    name1    name1 name1   name1   name1   name1
1   60.102783   0.020013755 89.109558   0.3552089   0.4015148   -0.33822596
1   60.107666   0.020006953 89.025749   0.35519764  0.4015218   -0.33821729
1   60.112549   0.02000189  88.886292   0.3551946   0.4015184   -0.33822691
1   60.117432   0.020007374 89.559196   0.35519707  0.40151948  -0.33823174
1   60.122314   0.019991774 89.741402   0.35519552  0.40151322  -0.33822927
1   60.127197   0.020003742 89.748924   0.35520011  0.40150556  -0.33822462

if line[0].isdigit(): whatever()

Steven Rumbalski
– Steven Rumbalski

2012-12-03 16:02:26 +00:00
Commented Dec 3, 2012 at 16:02 — Steven Rumbalski
– Steven Rumbalski, Commented Dec 3, 2012 at 16:02

Chris · Accepted Answer · 2012-12-03 23:35:11Z

4

Python will read all file elements as strings initially unless you cast them, so your method won't work.

Your best bet is probably to use a regular expression to filter out lines with non-data characters in them.

f = open("datafile")
for line in f:
  #Catch everything that has a non-number/space in it
  if re.search("[^-0-9.\s]",line): 
     continue
  # Catch empty lines
  if len(line.strip()) == 0:
     continue
  # Keep the rest
  print(line)

f.close()

edited Dec 3, 2012 at 23:35

answered Dec 3, 2012 at 16:05

Chris

18.2k1 gold badge18 silver badges16 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Starter2 Over a year ago

Wow, thanks a lot! All I had to do is to modify if re.search("[^-0-9.\s]"): for if re.search("[^-0-9.\s]",line): and rock on.

Chris Over a year ago

@Starter2 can you mark it as the answer if it answers your question? ;)

oz123 · Accepted Answer · 2012-12-03 16:32:28Z

0

Why are you not using numpy.loadtxt ? it has a very nice interface exactly for these cases.
See the documentation here

yourArry = np.loadtxt(open('yourfilename.txt', skiprows=7)

Also, since you have the heder (which should be header as an something which can be found in the top of a file) you could split you file into multiple files. You could do it with Python, or you could use the UNIX command csplit. How to do it, and what you will get:

oz123@:~/tmp> csplit -k data.txt   '/^bla/' '{*}'
0
787
786
oz123@:~/tmp> ls xx
xx00  xx01  xx02
oz123@:~/tmp> ls xx00
xx00
oz123@:~/tmp> cat xx00
oz123@:~/tmp> cat xx01
bla bla

cyclical stuff                      Time:   81.095947   Sec 2012-08-02 17:05:42
stored :    1   cycle           stores for :    62  seg-cycle
Points :    4223
Servo_Hyd count Temps   Servo_Air pressure  Servo_Hyd load Servo_Hyd LVDT1  Servo_Hyd LVDT2 Servo_Hyd LVDT3
name1    name1    name1 name1   name1   name1   name1
1   60.102783   0.020013755 89.109558   0.3552089   0.4015148   -0.33822596
1   60.107666   0.020006953 89.025749   0.35519764  0.4015218   -0.33821729
1   60.112549   0.02000189  88.886292   0.3551946   0.4015184   -0.33822691
1   60.117432   0.020007374 89.559196   0.35519707  0.40151948  -0.33823174
1   60.122314   0.019991774 89.741402   0.35519552  0.40151322  -0.33822927
1   60.127197   0.020003742 89.748924   0.35520011  0.40150556  -0.33822462

oz123@:~/tmp> cat xx02
bla bla

cyclical stuff                      Time:   81.095947   Sec 2012-08-02 17:05:42
stored :    1   cycle           stores for :    62  seg-cycle
Points :    4223
Servo_Hyd count Temps   Servo_Air pressure  Servo_Hyd load Servo_Hyd LVDT1  Servo_Hyd LVDT2 Servo_Hyd LVDT3
name1    name1    name1 name1   name1   name1   name1
1   60.102783   0.020013755 89.109558   0.3552089   0.4015148   -0.33822596
1   60.107666   0.020006953 89.025749   0.35519764  0.4015218   -0.33821729
1   60.112549   0.02000189  88.886292   0.3551946   0.4015184   -0.33822691
1   60.117432   0.020007374 89.559196   0.35519707  0.40151948  -0.33823174
1   60.122314   0.019991774 89.741402   0.35519552  0.40151322  -0.33822927
1   60.127197   0.020003742 89.748924   0.35520011  0.40150556  -0.33822462

edited Dec 3, 2012 at 16:32

answered Dec 3, 2012 at 16:03

oz123

29.1k30 gold badges133 silver badges196 bronze badges

6 Comments

Steven Rumbalski Over a year ago

Could you give an example? My reading of those docs doesn't show a way to handle headers scattered throughout the file.

oz123 Over a year ago

@StevenRumbalski, I guess it assumes a header is really on the top and not somewhere in the file.

Chris Over a year ago

@Oz123 Which unfortunately is not the situation in the OP's question

oz123 Over a year ago

@Chris, the OP has probably got all the data file from some instrument. This instrument - and I am guess here - spits single files. For some reason, the OP has them stacked to one file. Which should not be a problem to splitting them to multiple files and than reading them...

Starter2 Over a year ago

@Oz123, the instrument automatically append the data of different cycle through the test. I didn't stack them. I'm making a GUI to analyse the data, so I don't want the user to import multiple files. It may take more time, but it will be easier for the user.

|

Collectives™ on Stack Overflow

remove string from text file keep float

2 Answers 2

2 Comments

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related