0

I'm using the NumPy python library to run large-scale edits on a .csv file. I'm using this python code:

import numpy as np

def main():
    try:
        e,a,ad,c,s,z,ca,fn,ln,p,p2,g,ssn,cn,com,dob,doh,em = np.loadtxt('c:\wamp\www\_quac\carryover_data\SI\Employees.csv',delimiter=',',unpack=True,dtype='str')

        x=0
        dob = dob.split('/')
        for digit in dob:
            if len(digit) == 1:
                digit = str('0'+digit)
        dob = str(dob[2]+'-'+dob[0]+'-'+dob[1])

        doh = doh.split('/')
        for digit in doh:
            if len(digit) == 1:
                digit = str('0'+digit)
        doh = str(doh[2]+'-'+doh[0]+'-'+doh[1])

        for eID in e:
            saveLine=eID+','+a[x]+','+ad[x]+','+c[x]+','+s[x]+','+z[x]+','+ca[x]+','+fn[x]+','+ln[x]+','+p[x]+','+p2[x]+','+g[x]+','+ssn[x]+','+cn[x]+','+com[x]+','+dob[x]+','+doh[x]+','+em[x]+'\n'
            saveFile = open('fixedEmployees.csv','a')
            saveFile.write(saveLine)
            saveFile.close()
            x+=1


    except Exception, e:
        print str(e)

main()

dob and doh contain a string, e.g. 4/26/2012 and I'm trying to convert these to mysql friendly DATE forms, e.g. 2012-04-26. The error that is printed when I run this script is

cannot set an array element with a sequence

It does not specify a line and so I don't know what this really means. I'm pretty new to python; I've checked other questions with this same error but I can't make sense of their code. Any help is very appreciated.

5
  • also: perhaps this would be more easily done in excel? How could I do that? Commented May 13, 2014 at 5:52
  • Please remove the try: except clause and re-run your code to get the line and traceback for the error. Commented May 13, 2014 at 6:54
  • @ebarr doing so pointed me to the line with e,a,ad,c,s,z,ca,fn,ln,p,p2 etc. and cited the same exact error... Commented May 13, 2014 at 7:36
  • 1
    If you don't use a tuple unpack on the np.loadtxt command, does it execute without error? Commented May 13, 2014 at 10:28
  • @ebarr yeah, it still did. I managed to get things to work although it was somehwhat of a workaround. I'm posting the answer below. Commented May 13, 2014 at 18:21

2 Answers 2

1

Try using zfill to reformat the date string so you can have a '0' before your '4'. (zfill pads a string on the left with zeros to fill the width.)

doh = '4/26/2012'
doh = doh.split('/')

for i, s in enumerate(doh):
    doh[i] = s.zfill(2)

doh = doh[2]+'-'+doh[0]+'-'+doh[1]

# result: '2012-04-26'

As for the cannot set an array element with a sequence it would be helpful to know where that is occurring. I'm guessing there is something wrong with structure of the array.

Sign up to request clarification or add additional context in comments.

Comments

0

Ok, to solve it I had to do a couple things. After removing the try-except commands, I found out that the error was on line 5, the line with e,a,ad,c,s etc. I couldn't eliminate the problem until I simply copied the 2 columns I wanted to focus on only and made a new program for dealing with those.

Then I had to create a .txt instead of a .csv because Excel auto-formats the dates and literally changes the values before I can even touch them. There is no way around that, I've learned. You can't turn the date-auto-format off. A serious problem with excel. So here's my solution for this NumPy script (it changes the first column and keeps the second the same):

import numpy as np
def main():
    dob,doh=np.loadtxt('temp.csv',
                             delimiter=',',
                             unpack=True,
                             dtype='str')


    x=0
    for eachDate in dob:
        if any(c.isalpha() for c in eachDate):
            newDate=eachDate
        elif (eachDate == ''):
            newDate=''
        else:
            sp = eachDate.split('/')
            y=0
            ndArray = ['','','']
            for eachDig in sp:
                if len(eachDig) == 1:
                    eachDig = str('0'+eachDig)
                if y == 0:
                    ndArray[0] = eachDig
                elif y == 1:
                    ndArray[1] = eachDig
                elif y == 2:
                    ndArray[2] = eachDig
                    newDate=str(ndArray[2]+'-'+ndArray[0]+'-'+ndArray[1])
                    y=0
                y+=1

        print eachDate+'--->'+newDate

        """creates a .txt file with the edited dates"""
        saveLine=str(newDate+','+doh[x]+'\n')
        saveFile=open('__newTemp.txt','a')
        saveFile.write(saveLine)
        saveFile.close()
        x+=1


main()

I then used Data->Import from text with "TEXT" format option in Excel to get the column into my .csv. I realize this is probably bulky and noobish but it got the job done :3

1 Comment

To summarize, I believe one of my columns in the original excel file was incompatible with numpy's loadtxt()

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.