3

I am trying to use the python CSV reader for the first time. I have a method that asks the user to select the file that they want to parse and then it passes that file path to the parse method:

def parse(filename):
        parsedFile = []
        with open(filename, 'rb') as csvfile:
                dialect = csv.Sniffer().sniff(csvfile.read(), delimiters=';,|')
                csvfile.seek(0)
                reader = csv.reader(csvfile, dialect)

                for line in reader:
                    parsedFile.append(line)
                return(parsedFile)

def selectFile():
        print('start selectFile method')
        localPath = os.getcwd() + '\Files'
        print(localPath)
        for fileA in os.listdir(localPath):
                print (fileA)

        test = False
        while test == False:
                fileB = input('which file would you like to DeID? \n')
                conjoinedPath = os.path.join(localPath, fileB)
                test = os.path.isfile(conjoinedPath)


        userInput = input('Please enter the number corresponding to which client ' + fileB + ' belongs to. \n\nAcceptable options are: \n1.A \n2.B \n3.C \n4.D \n5.E \n')
        client = ''
        if (userInput == '1'):
                client = 'A'
        elif (userInput == '2'):
                client = 'B'
        elif (userInput == '3'):
                client = 'CServices'
        elif (userInput == '4'):
                client = 'D'
        elif (userInput == '5'):
                client = 'E'
        return(client, conjoinedPath)



def main():
       x, y = selectFile() 
       parse(y)


if __name__ == '__main__':
        main()

All of it seems to be working as intended, but I am getting a:

TypeError: can't use a string pattern on a bytes-like object 

when trying to open filename (line 3 in the code). I have tried to convert filename to to both a string-type and a byte-type and neither seem to work.

Here is the output:

>>> 
start selectFile method
C:\PythonScripts\DeID\Files
89308570_201601040630verifyppn.txt
89339985_201601042316verifyppn.txt
which file would you like to DeID? 
89339985_201601042316verifyppn.txt
Please enter the number corresponding to which client 89339985_201601042316verifyppn.txt belongs to. 

Acceptable options are: 
1.Client A
2.Client B
3.Client C
4.Client D
5.Client E
3
Traceback (most recent call last):
  File "C:\PythonScripts\DeID\DeIDvA1.py", line 107, in <module>
    main()
  File "C:\PythonScripts\DeID\DeIDvA1.py", line 103, in main
    parse(y)
  File "C:\PythonScripts\DeID\DeIDvA1.py", line 63, in parse
    dialect = csv.Sniffer().sniff(csvfile.read(), delimiters=';,|')
  File "C:\Python34\lib\csv.py", line 183, in sniff
    self._guess_quote_and_delimiter(sample, delimiters)
  File "C:\Python34\lib\csv.py", line 224, in _guess_quote_and_delimiter
    matches = regexp.findall(data)
TypeError: can't use a string pattern on a bytes-like object
>>> 

I am not sure what I am doing wrong.

1 Answer 1

9

It is not the filename to be blamed here, but the fact you are opening the file with:

with open(filename, 'rb') as csvfile:

Where the 'rb' mode specifies that the file will be opened in binary mode, that is, the contents of the file are treated as byte objects. Documentation:

'b' appended to the mode opens the file in binary mode: now the data is read and written in the form of bytes objects. This mode should be used for all files that don’t contain text.

Then you attempt to search within it with csv.Sniff().sniff() with a string pattern, and, as the TypeError gracefully points out, this isn't allowed.

Removing b from the mode and simply using r will do the trick.


Note: Python 2.x doesn't exhibit this behavior on Unix machines. This is a result of the segregation of bytes and str objects as distinct types in 3.x.

Sign up to request clarification or add additional context in comments.

1 Comment

Note: You should also be passing newline='' to open on Py3, to prevent line ending conversions (csv handles this itself, because line endings are part of the CSV dialect) and to allow newlines to appear in quoted fields properly. In Py2, you opened in binary mode for the same reason.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.