1

I have a file test.txt which has an array:

array = [3,5,6,7,9,6,4,3,2,1,3,4,5,6,7,8,5,3,3,44,5,6,6,7]

Now what I want to do is get the content of array and perform some calculations with the array. But the problem is when I do open("test.txt") it outputs the content as the string. Actually the array is very big, and if I do a loop it might not be efficient. Is there any way to get the content without splitting , ? Any new ideas?

1
  • 2
    Why not just make a .py file with the data? Commented Jun 10, 2012 at 0:34

7 Answers 7

9

I recommend that you save the file as json instead, and read it in with the json module. Either that, or make it a .py file, and import it as python. A .txt file that looks like a python assignment is kind of odd.

Sign up to request clarification or add additional context in comments.

Comments

5

Does your text file need to look like python syntax? A list of comma separated values would be the usual way to provide data:

1,2,3,4,5

Then you could read/write with the csv module or the numpy functions mentioned above. There's a lot of documentation about how to read csv data in efficiently. Once you had your csv reader data object set up, data could be stored with something like:

data = [ map( float, row) for row in csvreader]

Comments

5

If you want to store a python-like expression in a file, store only the expression (i.e. without array =) and parse it using ast.literal_eval().

However, consider using a different format such as JSON. Depending on the calculations you might also want to consider using a format where you do not need to load all data into memory at once.

Comments

2

Must the array be saved as a string? Could you use a pickle file and save it as a Python list?

If not, could you try lazy evaluation? Maybe only process sections of the array as needed.

Possibly, if there are calculations on the entire array that you must always do, it might be a good idea to pre-compute those results and store them in the txt file either in addition to the list or instead of the list.

Comments

2

You could also use numpy to load the data from the file using numpy.genfromtxt or numpy.loadtxt. Both are pretty fast and both have the ability to do the recasting on load. If the array is already loaded though, you can use numpy to convert it to an array of floats, and that is really fast.

import numpy as np
a = np.array(["1", "2", "3", "4"])
a = a.astype(np.float)

Comments

1

You could write a parser. They are very straightforward. And much much faster than regular expressions, please don't do that. Not that anyone suggested it.

# open up the file (r = read-only, b = binary)
stream = open("file_full_of_numbers.txt", "rb")
prefix = '' # end of the last chunk
full_number_list = []

# get a chunk of the file at a time
while True:
    # just a small 1k chunk
    buffer = stream.read(1024)
    # no more data is left in the file
    if '' == buffer:
        break
    # delemit this chunk of data by a comma
    split_result = buffer.split(",")
    # append the end of the last chunk to the first number
    split_result[0] = prefix + split_result[0]
    # save the end of the buffer (a partial number perhaps) for the next loop
    prefix = split_result[-1]
    # only work with full results, so skip the last one 
    numbers = split_result[0:-1]
    # do something with the numbers we got (like save it into a full list)
    full_number_list += numbers

# now full_number_list contains all the numbers in text format

You'll also have to add some logic to use the prefix when the buffer is blank. But I'll leave that code up to you.

Comments

1

OK, so the following methods ARE dangerous. Since they are used to attack systems by injecting code into them, used them at your own risk.
array = eval(open("test.txt", 'r').read().strip('array = '))
execfile('test.txt') # this is the fastest but most dangerous.

Safer methods.

import ast
array = ast.literal_eval(open("test.txt", 'r').read().strip('array = ')).
  ...
array = [float(value) for value in open('test.txt', 'r').read().strip('array = [').strip('\n]').split(',')]

The eassiest way to serialize python objects so you can load them later is to use pickle. Assuming you dont want a human readable format since this adds major head, either-wise, csv is fast and json is flexible.

import pickle
import random
array = random.sample(range(10**3), 20)
pickle.dump(array, open('test.obj', 'wb'))

loaded_array = pickle.load(open('test.obj', 'rb'))
assert array == loaded_array

pickle does have some overhead and if you need to serialize large objects you can specify the compression ratio, the default is 0 no compression, you can set it to pickle.HIGHEST_PROTOCOL pickle.dump(array, open('test.obj', 'wb'), pickle.HIGHEST_PROTOCOL)

If you are working with large numerical or scientific data sets then use numpy.tofile/numpy.fromfile or scipy.io.savemat/scipy.io.loadmat they have little overhead, but again only if you are already using numpy/scipy.

good luck.

2 Comments

ast.literal_eval() would be better and secure use
Yep, but in the python community we are always assuming we are all consenting adults and we know what we are doing .... To bad that's rarely the case.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.