String arrays in float arrays without format change

Question

Is there a way to include strings in an array of floats without the format of the array changing such that all floats are changed to strings but the string element is still kept as a string?

eg.

import numpy as np

a = np.array([ 'hi' , 1. , 2. , 3. ])

Ideally I would like the format to remain the same as how it looks when input as 'a' above.

This gives:

array(['hi', '1.0', '2.0', '3.0'], dtype='|S3')

And then how would one save such an array as a text file?

Many thanks,

J

mention python in the tag (since that's what you are using I assume) — Arvind Sasikumar
– Arvind Sasikumar, Commented Jul 5, 2017 at 8:58
Aren't lists in python heterogeneous anyways? I don't understand the problem you are facing... — Arvind Sasikumar
– Arvind Sasikumar, Commented Jul 5, 2017 at 9:00
The problem is when I create this array and try and save it as a text file it won't do it because of a mismatch between array dtype and so essentially I'm asking how to overcome this problem and save an array containing both strings and floats in a way that I can read the text file back in and use later on extracting strings and floats. — user8188120
– user8188120, Commented Jul 5, 2017 at 9:06
While specifying dtype=object might solve some of your problems, this is not how NumPy was designed to work, and using object arrays will cause weird incompatibilities and destroy most of the advantages NumPy arrays have over plain lists. — user2357112
– user2357112, Commented Jul 5, 2017 at 9:08

Astrokiwi · Accepted Answer · 2017-07-06 08:05:24Z

3

I'm guessing your problem is this: you want to dump out the array np.array([ 'hi' , 1. , 2. , 3. ]) using np.savetxt() but are getting this error:

TypeError: Mismatch between array dtype ('|S3') and format specifier ('%.18e')

If this is the case, you just need to set the fmt kwarg in np.savetxt. Instead of the default %.18e, which is for formatting floating point data, you can use %s, which formats things as a string, even if the original value in the array was numerical.

So this will work:

import numpy as np
a = np.array([ 'hi' , 1. , 2. , 3. ])
np.savetxt("test.out",a,fmt="%s")

Note that you can just do this with the original list - numpy will convert it to an array for you. So for example you can do:

np.savetxt("test.out",[ 'hi' , 1. , 2. , 3. ],fmt="%s")

and it should work fine too.

For the first part of the question, this is not really what numpy arrays are intended for. If you are trying to put different data types into the same array, then you probably want a different data structure. A vanilla python list would do it, but depending on your situation, a dict is probably what you're looking for.

Edit: Based on the comment threads & the specific question, it looks like this is an attempt to make a header on a data file. This can be done directly through

np.savetxt("a.txt",a,header="title goes here")

This can be read directly with np.loadtxt() because by default the header is prepended with #, and by default np.loadtxt() ignores lines that start with #.

edited Jul 6, 2017 at 8:05

answered Jul 5, 2017 at 9:15

Astrokiwi

687 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

user8188120 Over a year ago

That is the exact error I'm having, thank you. That worked for saving the file but if you want to then load that array you get an error of not being able to convert string to float for the 'hi' element. Any ideas on that one? Thank you by the way!

user8188120 Over a year ago

Never mind I did a quick search and found out how to do this, thank you Astrokiwi!

Astrokiwi Over a year ago

Could you give the method you used so I can add it to my answer? It will be useful for googlers of the future.

user8188120 Over a year ago

yeah that's fine it went as follows: data = np.loadtxt('a.txt') # new line # data_floats = data[0,1:].astype(np.float)

Astrokiwi Over a year ago

Hmm - is the reason you're mixing strings & floats because you want to have a header/title line at the top of your output file? In that case, you can do np.savetxt("a.txt",a,header="title goes here"), and it will start the file with #title goes here. Then you can just use loadtxt directly, because it will ignore any line that starts with #.

|

Flomp · Accepted Answer · 2017-07-05 09:09:14Z

1

Use pickle:

import pickle

a = ['abc',3,4,5,6,7.0]
pickle.dump( a, open( "save.p", "wb" ))
b = pickle.load( open( "save.p", "rb" ) )

print(b)

Output:

['abc', 3, 4, 5, 6, 7.0]

answered Jul 5, 2017 at 9:09

Flomp

5245 silver badges17 bronze badges

3 Comments

user8188120 Over a year ago

Thanks! How would this work if you try and vstack two arrays of the same style as 'a' and then save/open?

Flomp Over a year ago

If you only need to vstack you can still do it with lists c= [a]+[b]. However if you need more numpy, you should consider using pandas if you really have to mix datatypes.

user8188120 Over a year ago

Okay thanks I'll bear that in mind and try saving as one data type, extracting the columns I need and the converting those to different data-types separately instead

Collectives™ on Stack Overflow

String arrays in float arrays without format change

2 Answers 2

6 Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related