how to use split() on python numpy.bytes_ type? (read dictionary from file)

Question

I want to read data from a (very large, whitespace separated, two-column) text file into a Python dictionary. I tried to do this with a for-loop but that was too slow. MUCH faster is reading it with numpy loadtxt into a struct array and then converting it to a dictionary:

data = np.loadtxt('filename.txt', dtype=[('field1', 'a20'), ('field2', int)], ndmin=1)
result = dict(data)

But this is surely not the best way? Any advice?

The main reason I need something else, is that the following does not work:

data[0]['field1'].split(sep='-')

It leads to the error message:

TypeError: Type str doesn't support the buffer API

If the split() method exists, why can't I use it? Should I use a different dtype? Or is there a different (fast) way to read the text file? Is there anything else I am missing?

Versions: python version 3.3.2 numpy version 1.7.1

Edit: changed data['field1'].split(sep='-') to data[0]['field1'].split(sep='-')

One of these days I am going to have to try and understand unicode... By the way, the right thing to do is to write the answer as a proper answer and accept it, not to include it within your question. — Jaime
– Jaime, Commented Jul 30, 2013 at 19:45

Jaime · Accepted Answer · 2013-07-30 18:02:26Z

3

The standard library split returns a variable number of arguments, depending on how many times the separator is found in the string, and is therefore not very suitable for array operations. My char numpy arrays (I'm running 1.7) do not have a split method, by the way.

You do have np.core.defchararray.partition, which is similar but poses no problems for vectorization, as well as all the other string operations:

>>> a = np.array(['a - b', 'c - d', 'e - f'], dtype=np.string_)
>>> a
array(['a - b', 'c - d', 'e - f'], 
      dtype='|S5')
>>> np.core.defchararray.partition(a, '-')
array([['a ', '-', ' b'],
       ['c ', '-', ' d'],
       ['e ', '-', ' f']], 
      dtype='|S2')

answered Jul 30, 2013 at 18:02

Jaime

67.7k19 gold badges128 silver badges164 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Louic Over a year ago

Thank you for your answer Jaime. What I meant was data**[0]**['field1'].split(sep='-'), not data['field1'].split(sep='-') although the latter would be brilliant if it existed and was fast. I edited my above post accordingly.

Jaime Over a year ago

With my made up exmaple I can run a[0].split('-'), which should be equivalent to data['field1'][0].split(sep='-'), so reversing the order of your indices. How many - are you expecting in your strings?

Louic Over a year ago

With your example I get:

>>> np.core.defchararray.partition(a, '-') Traceback (most recent call last):   File "<stdin>", line 1, in <module>   File "/usr/lib/python3.3/site-packages/numpy/core/defchararray.py", line 1090, in partition     _vec_string(a, object_, 'partition', (sep,))) TypeError: expected bytes, bytearray or buffer compatible object

Jaime Over a year ago

Then go with partition, and split all your strings with a single call.

Louic Over a year ago

actually, just b'a-b'.split(b'-') is OK.

|

Community · Accepted Answer · 2017-05-23 12:05:17Z

1

Because: type(data[0]['field1']) gives <class 'numpy.bytes_'> , the split() method does not work when it has a "normal" string as argument (is this a bug?)

the way I solved it: data[0]['field1'].split(sep=b'-') (the key to this is to put the b in front of '-')

And of course Jaime's suggestion to use the following was very helpful: np.core.defchararray.partition(a, '-') but also in this case b'-' is needed to make it work.

In fact, a related question was answered here: Type str doesn't support the buffer API although at first sight I did not realise this was the same issue.

edited May 23, 2017 at 12:05

CommunityBot

11 silver badge

answered Jul 30, 2013 at 19:52

Louic

2,6233 gold badges21 silver badges37 bronze badges

Collectives™ on Stack Overflow

how to use split() on python numpy.bytes_ type? (read dictionary from file)

2 Answers 2

7 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

7 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related