how to load data and store the data from a file using numpy

Question

I have the following file like this:

2 qid:1 1:0.32 2:0.50 3:0.78 4:0.02 10:0.90
5 qid:2 2:0.22 5:0.34 6:0.87 10:0.56 12:0.32 19:0.24 20:0.55
...

he structure is follwoing like that:

output={} rel=2 qid=1 features={} # the feature list "1:0.32 2:0.50 3:0.78 4:0.02 10:0.90" output.append([rel,qid,features]) ... How can I write my python code to load the data, thanks

It would be helpful if you describe the desired output data structure. — mtrw
– mtrw, Commented Mar 7, 2010 at 9:42

osdf · Accepted Answer · 2010-05-28 12:23:59Z

1

For reading use something like this (data is in file 'fname'):

f = open(fname)
lines = f.readlines(f)
for line in lines:
    elements = line.split(' ')
    rel = int(elements[0])
    qid = int(elements[1].split(':')[1])
    featurelist = elements[2:]
    # get the various features again with splitting at ':'
    # you get the idea ...

edited May 28, 2010 at 12:23

answered Mar 7, 2010 at 14:26

osdf

81810 silver badges20 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

CTKlein · Accepted Answer · 2013-08-28 17:16:06Z

0

The following should work nicely and leaves your data in a handy format:

regexp = r"(\d+)\s+qid:(\d+)\s+(.+)"
data = np.fromregex(file_name, regexp, 
                    dtype=[('rel', int), ('qid', int), ('features', object)])

From here you can select rel, qid or features by calling:

>>> data['rel']
array([2, 5])
>>> data['qid']
array([1, 2])
>>> data['features']
array(['1:0.32 2:0.50 3:0.78 4:0.02 10:0.90',
       '2:0.22 5:0.34 6:0.87 10:0.56 12:0.32 19:0.24 20:0.55'], dtype=object)

edited Aug 28, 2013 at 17:16

answered Jul 19, 2013 at 7:15

CTKlein

3091 gold badge2 silver badges13 bronze badges

Comments

lmjohns3 · Accepted Answer · 2013-08-28 22:10:54Z

0

It looks like your input files are in svmlight format. If this is true, then there's a parser included as part of scikit-learn that might be handy to use -- see the source at:

https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/datasets/svmlight_format.py#L32

answered Aug 28, 2013 at 22:10

lmjohns3

7,6325 gold badges39 silver badges57 bronze badges

Collectives™ on Stack Overflow

how to load data and store the data from a file using numpy

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related