46

I have this simple code:

import re, sys

f = open('findallEX.txt', 'r')
lines = f.readlines()
match = re.findall('[A-Z]+', lines)
print match

I don't know why I am getting the error:

'expected string or buffer'

Can anyone help?

2
  • 1
    Replace f.readlines() by f.read(). Commented Mar 26, 2014 at 0:51
  • 1
    If lines were None you'd get the same error here as if you input a list. This would also occur with re.sub in the same circumstance. Hence it being a TypeError (the wrong type being entered). I just mention this because I searched for what caused this error and found your post (and I had a Nonetype on accident). Commented Jul 26, 2014 at 8:20

5 Answers 5

37

lines is a list. re.findall() doesn't take lists.

>>> import re
>>> f = open('README.md', 'r')
>>> lines = f.readlines()
>>> match = re.findall('[A-Z]+', lines)
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/usr/lib/python2.7/re.py", line 177, in findall
    return _compile(pattern, flags).findall(string)
TypeError: expected string or buffer
>>> type(lines)
<type 'list'>

From help(file.readlines). I.e. readlines() is for loops/iterating:

readlines(...)
    readlines([size]) -> list of strings, each a line from the file.

To find all uppercase characters in your file:

>>> import re
>>> re.findall('[A-Z]+', open('README.md', 'r').read())
['S', 'E', 'A', 'P', 'S', 'I', 'R', 'C', 'I', 'A', 'P', 'O', 'G', 'P', 'P', 'T', 'V', 'W', 'V', 'D', 'A', 'L', 'U', 'O', 'I', 'L', 'P', 'A', 'D', 'V', 'S', 'M', 'S', 'L', 'I', 'D', 'V', 'S', 'M', 'A', 'P', 'T', 'P', 'Y', 'C', 'M', 'V', 'Y', 'C', 'M', 'R', 'R', 'B', 'P', 'M', 'L', 'F', 'D', 'W', 'V', 'C', 'X', 'S']
Sign up to request clarification or add additional context in comments.

Comments

7

lines is a list of strings, re.findall doesn't work with that. try:

import re, sys

f = open('findallEX.txt', 'r')
lines = f.read()
match = re.findall('[A-Z]+', lines)
print match

Comments

4

readlines() will return a list of all the lines in the file, so lines is a list. You probably want something like this:

for line in f.readlines(): # Iterates through every line and looks for a match
#or
#for line in f:
    match = re.findall('[A-Z]+', line)
    print match

Or, if the file isn't too large you can grab it as as single string:

lines = f.read() # Warning: reads the FULL FILE into memory. This can be bad.
match = re.findall('[A-Z]+', lines)
print match

1 Comment

Actually, you can (and even should) skip readlines altogether: for line in f:
3

'lines' term from your snippet consists of set of strings.

 lines = f.readlines()
 match = re.findall('[A-Z]+', lines)

You cannot send entire lines into the re.findall('pattern',<string>)

You can try to send line by line

 for i in lines:
  match = re.findall('[A-Z]+', i)
  print match

or to convert the entire lines collection into single line (each line seperated by space)

 NEW_LIST=' '.join(lines)
 match=re.findall('[A-Z]+' ,NEW_LIST)
 print match

This might help you

Comments

1

re.findall finds all the occurrence of the regex in a string and return in a list. Here, you are using a list of strings, you need this to use re.findall

Note - If the regex fails, an empty list is returned.

import re, sys

f = open('picklee', 'r')
lines = f.readlines()  
regex = re.compile(r'[A-Z]+')
for line in lines:
     print (re.findall(regex, line))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.