get specific content from file python

Question

I have a file test.txt which has an array:

array = [3,5,6,7,9,6,4,3,2,1,3,4,5,6,7,8,5,3,3,44,5,6,6,7]

Now what I want to do is get the content of array and perform some calculations with the array. But the problem is when I do open("test.txt") it outputs the content as the string. Actually the array is very big, and if I do a loop it might not be efficient. Is there any way to get the content without splitting , ? Any new ideas?

Why not just make a .py file with the data?

Karl Knechtel
– Karl Knechtel

2012-06-10 00:34:23 +00:00
Commented Jun 10, 2012 at 0:34 — Karl Knechtel
– Karl Knechtel, Commented Jun 10, 2012 at 0:34

Ned Batchelder · Accepted Answer · 2012-06-10 00:34:14Z

9

I recommend that you save the file as json instead, and read it in with the json module. Either that, or make it a .py file, and import it as python. A .txt file that looks like a python assignment is kind of odd.

answered Jun 10, 2012 at 0:34

Ned Batchelder

378k77 gold badges583 silver badges675 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

abought · Accepted Answer · 2012-06-10 05:25:16Z

5

Does your text file need to look like python syntax? A list of comma separated values would be the usual way to provide data:

1,2,3,4,5

Then you could read/write with the csv module or the numpy functions mentioned above. There's a lot of documentation about how to read csv data in efficiently. Once you had your csv reader data object set up, data could be stored with something like:

data = [ map( float, row) for row in csvreader]

answered Jun 10, 2012 at 5:25

abought

2,6801 gold badge20 silver badges14 bronze badges

Comments

ThiefMaster · Accepted Answer · 2012-06-10 00:38:42Z

5

If you want to store a python-like expression in a file, store only the expression (i.e. without array =) and parse it using ast.literal_eval().

However, consider using a different format such as JSON. Depending on the calculations you might also want to consider using a format where you do not need to load all data into memory at once.

answered Jun 10, 2012 at 0:38

ThiefMaster

320k85 gold badges608 silver badges648 bronze badges

Comments

user1245262 · Accepted Answer · 2012-06-10 00:59:38Z

2

Must the array be saved as a string? Could you use a pickle file and save it as a Python list?

If not, could you try lazy evaluation? Maybe only process sections of the array as needed.

Possibly, if there are calculations on the entire array that you must always do, it might be a good idea to pre-compute those results and store them in the txt file either in addition to the list or instead of the list.

answered Jun 10, 2012 at 0:59

user1245262

7,55512 gold badges55 silver badges88 bronze badges

Comments

reptilicus · Accepted Answer · 2012-06-10 02:24:43Z

2

You could also use numpy to load the data from the file using numpy.genfromtxt or numpy.loadtxt. Both are pretty fast and both have the ability to do the recasting on load. If the array is already loaded though, you can use numpy to convert it to an array of floats, and that is really fast.

import numpy as np
a = np.array(["1", "2", "3", "4"])
a = a.astype(np.float)

answered Jun 10, 2012 at 2:24

reptilicus

10.4k6 gold badges59 silver badges80 bronze badges

Comments

Robert · Accepted Answer · 2012-06-10 01:12:28Z

You could write a parser. They are very straightforward. And much much faster than regular expressions, please don't do that. Not that anyone suggested it.

# open up the file (r = read-only, b = binary)
stream = open("file_full_of_numbers.txt", "rb")
prefix = '' # end of the last chunk
full_number_list = []

# get a chunk of the file at a time
while True:
    # just a small 1k chunk
    buffer = stream.read(1024)
    # no more data is left in the file
    if '' == buffer:
        break
    # delemit this chunk of data by a comma
    split_result = buffer.split(",")
    # append the end of the last chunk to the first number
    split_result[0] = prefix + split_result[0]
    # save the end of the buffer (a partial number perhaps) for the next loop
    prefix = split_result[-1]
    # only work with full results, so skip the last one 
    numbers = split_result[0:-1]
    # do something with the numbers we got (like save it into a full list)
    full_number_list += numbers

# now full_number_list contains all the numbers in text format

You'll also have to add some logic to use the prefix when the buffer is blank. But I'll leave that code up to you.

Samy Vilar · Accepted Answer · 2012-06-10 19:04:33Z

1

OK, so the following methods ARE dangerous. Since they are used to attack systems by injecting code into them, used them at your own risk.
array = eval(open("test.txt", 'r').read().strip('array = '))
execfile('test.txt') # this is the fastest but most dangerous.

Safer methods.

import ast
array = ast.literal_eval(open("test.txt", 'r').read().strip('array = ')).
  ...
array = [float(value) for value in open('test.txt', 'r').read().strip('array = [').strip('\n]').split(',')]

The eassiest way to serialize python objects so you can load them later is to use pickle. Assuming you dont want a human readable format since this adds major head, either-wise, csv is fast and json is flexible.

import pickle
import random
array = random.sample(range(10**3), 20)
pickle.dump(array, open('test.obj', 'wb'))

loaded_array = pickle.load(open('test.obj', 'rb'))
assert array == loaded_array

pickle does have some overhead and if you need to serialize large objects you can specify the compression ratio, the default is 0 no compression, you can set it to pickle.HIGHEST_PROTOCOL pickle.dump(array, open('test.obj', 'wb'), pickle.HIGHEST_PROTOCOL)

If you are working with large numerical or scientific data sets then use numpy.tofile/numpy.fromfile or scipy.io.savemat/scipy.io.loadmat they have little overhead, but again only if you are already using numpy/scipy.

good luck.

edited Jun 10, 2012 at 19:04

answered Jun 10, 2012 at 6:38

Samy Vilar

11.2k2 gold badges42 silver badges34 bronze badges

2 Comments

Xavier Combelle Over a year ago

ast.literal_eval() would be better and secure use

Samy Vilar Over a year ago

Yep, but in the python community we are always assuming we are all consenting adults and we know what we are doing .... To bad that's rarely the case.

Collectives™ on Stack Overflow

get specific content from file python

7 Answers 7

Comments

Comments

Comments

Comments

Comments

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

Comments

Comments

Comments

Comments

Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related