0

For my program I have a function that changes a string into a list however when it hits a newline character it combines the two words on either side of the newline character. Example:

"newline\n   problem"

Prints out like this in main function:

print(serperate_words)
newlineproblem

Here is the code:

def stringtolist(lines):
    # string of acceptable characters
    acceptable = "1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'’- " 
    new_string = ''
    for i in lines:
        # runs through the string and checks to see what characters are in the string
        if i in acceptable:
            i = i.lower()
            # if it is an acceptable character it is added to new string
            new_string += i
        elif i == '.""':
            # if it is a period or quotation marks it is replaced with a space in the new string
            new_string += ' '
        else:
            # for every other character it is removed and not added to new string
            new_string += ''


    #splits the string into a list
    seperate_words = new_string.split(' ')
    return seperate_words 
5
  • What are you trying to do? What is the seperator? Commented Mar 22, 2015 at 20:39
  • Just FYI: for i in lines: if i in acceptable: i = i.lower() will not modify your string. This is because in Python, every name is a reference, so if you assign a reference to something else, the originally referenced object will not change. Methods are the most common way of mutating mutable objects (which btw, strings are not mutable in python). Commented Mar 22, 2015 at 20:42
  • Why don't you use the split method? Commented Mar 22, 2015 at 20:46
  • use line_break="\n" lines = lines.replace(line_break,"") Commented Mar 22, 2015 at 20:49
  • @MTaqi Thanks that worked to replace the newlines right at the beginning for a space so that the program wouldn't connect the two words. Commented Mar 22, 2015 at 21:25

3 Answers 3

1

You can split a string with multiple delimiters:

def stringtolist(the_string):
    import re
    return re.split('[ \.\n]', the_string)

You can add other delimiters to the list if you want (like quotes, ...) => re.split('[ \.\n\'\"]', the_string)

Sign up to request clarification or add additional context in comments.

Comments

0

You can just check for the newline character and skip it. Here's an example.

for word in string:
    if ch is not '/n':
        newstring += ch

Or use

.strip() to remove newlines altogether

Comments

0

Because of the multiple transformations described in the comments of the original code, a more flexible approach could be to use the translate() method of strings (together with the maketrans() function):

def stringtolist(lines):
    import string
    acceptable_chars = string.ascii_letters + string.digits + "'`- "
    space_chars = '."'
    delete_chars = ''.join(set(map(chr, xrange(256))) - set(acceptable_chars))
    table = string.maketrans(acceptable + space_chars, acceptable.lower() + (' '*len(space_chars)))
    return lines.translate(table, delete_chars).split()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.