0

I have a csv file and I read it into array, the original csv a 5-row, 8-column file with empty elements

       1  2         3        4    5    6   7         8     
Row 1:   '1 1'     '4 4'                  '2 2'   
Row 2:   '3'       '3'                    '3' 
Row 3:   '1 1 1 1' '1 1 1 1'              '2 2 2 2'
Row 4:   '2'       '4'                    '2' 
Row 5:   '4'       '4'                               '4'

I read it into my code:

[[nan '1 1' '4 4' nan nan nan '2 2' nan]
 [nan '3' '3' nan nan nan '3' nan]
 [nan '1 1 1 1' '1 1 1 1' nan nan nan '2 2 2 2' nan]
 [nan '2' '4' nan nan nan '2' nan]
 [nan '4' '4' nan nan nan nan '4']]

So what I want to get is to replace all empty elements into same number of -1 with other elements:

[['-1 -1' '1 1' '4 4' '-1 -1' '-1 -1' '-1 -1' '2 2' '-1 -1']
 ['-1' '3' '3' '-1' '-1' '-1' '3' '-1']
 ['-1 -1 -1 -1' '1 1 1 1' '1 1 1 1' '-1 -1 -1 -1' '-1 -1 -1 -1' '-1 -1 -1 -1' '2 2 2 2' '-1 -1 -1 -1']
 ['-1' '2' '4' '-1' '-1' '-1' '2' '-1']
 ['-1' '4' '4' '-1' '-1' '-1' '-1' '4']]

When I use re.match("\d",element), I can not get the result. So could anyone help?

10
  • duplicate of : stackoverflow.com/questions/1540049/… Commented Jan 11, 2016 at 11:53
  • @C.LECLERC, thanks for your link, but I think my question is different because I need to replace the element with a specific number of -1, so I need to judge how many elements in each item. Commented Jan 11, 2016 at 11:56
  • What is nan? Some sort of constant? Commented Jan 11, 2016 at 11:58
  • i might misunderstand but in the example isn't "-1 -1" a single string element ? (even if it describes 2 numeric values). Commented Jan 11, 2016 at 11:59
  • @PaulRooney, in the original csv file, there are empty elements. When I read the file into my code, it displays as nan Commented Jan 11, 2016 at 11:59

3 Answers 3

1

what about :

for line in csvdata:
    multiplicity = max([len(datum.split(" ")) if isinstance(datum, str) else 0 for datum in line])
    for datum in line:
        if(not isinstance(datum, str)):
            datum = " ".join(["-1"]*multiplicity)

It looks awful to me, but it should works.

Sign up to request clarification or add additional context in comments.

5 Comments

There is an error because datum.split(" ") does not work here.
multiplicity = max([len(datum.split(" ")) for datum in line]) AttributeError: 'float' object has no attribute 'split'
i edited my answer, but the code is uglier than before.
i reedited my answer to add the NaN test condition. it assume that all values that is not a string must be replace with corresponding "-1"
I modified your code ,using a temp to store the datum and write in a newline. So I can get the result.
0

try this:

xs=[["nan '1 1' '4 4' nan nan nan '2 2' nan"],
    ["nan '3' '3' nan nan nan '3' nan"],
    ["nan '1 1 1 1' '1 1 1 1' nan nan nan '2 2 2 2' nan"],
    ["nan '2' '4' nan nan nan '2' nan"],
    ["nan '4' '4' nan nan nan nan '4'"]]

for x in xs:
   s = len(x[0].replace('nan','').replace(' ','').split("''")[0])-1
   r = ' '.join('v'*s).replace('v', '-1')
   r = "'%s'" % r
   x[0] = x[0].replace('nan', r)

Comments

0

I believe you should make clear in your question you are using a library (numPy). Most of the solutions will work for Python, but as you are already using numpy, this is a better solution I believe

x = np.asarray(pd.read_csv("data/org8.csv"))
x[np.isnan(x)] = -1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.