0

The situation is as following:

With the following piece of code:

import re

content = ''
count = len(re.split('\W+', content, flags=re.UNICODE))

print(count)

# Output is expected to be 0, as it has no words
# Instead output is 1

What is going wrong? All other word counts are correct.

EDIT: It also happens when we use a string content = '..' or content = '.!' thus this in NOT a problem related in any sense with python's split() function but with the regular expressions from re.

IMPORTANT NOTE: Although the solution I gave works in my particular case the correct solution is not yet met. Because it's an regex issue which isn't yet 100% SOLVED!

8
  • Not a duplicate because re an independent library and is expected to return only word and filter the occurrence mentioned in the other post. Commented Mar 13, 2015 at 1:45
  • how it returns only word characters where the input string is empty? Commented Mar 13, 2015 at 1:47
  • possible duplicate stackoverflow.com/questions/28970724/python-split-empty-string Commented Mar 13, 2015 at 1:48
  • No @AvinashRaj, it returns an array with an empty string when the input is an empty string in the link you mentioned this works correctly in the split function and doesn't when the string is "\n". This is something that has to do with the re.split() function. Commented Mar 13, 2015 at 1:54
  • It has to do with this: docs.python.org/2/library/re.html#re.split Commented Mar 13, 2015 at 1:56

1 Answer 1

0

Well found out what the reason is:

When re.split() is used, it splits a string based on the regular expression given and returns an array of strings. If string is empty and thus there is nothing to split, it apparantly return an array with an empty string in it (['']). So when the len() function is used it counts an array with 1 element.

A solution to this is the following piece of code:

import re

content = ''
count = [len(re.split('\W+', content, flags=re.UNICODE)), 0][content == '']

print(count)

# Output is as expected, 0, by using a simple if statement
# that verifies if string is empty, when it's empty it return 0,
# otherwise, it returns the word count.
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.