Append string based on condition python

Question

I just want to append strings based on my condition. For example all strings starting with http won't be appended but all the other strings in each that has a length of 40 will be appended.

    words = []
    store1 = []
   disregard = ["http","gen"]

    for all in glob.glob(r'MYDIR'):
        with open(all, "r",encoding="utf-16") as f:
            text = f.read()
        lines = text.split("\n")

        for each in lines:
            words += each.split()
        for each in words:
            if len(each) == 40 and each not in disregard:
                store1.append(each)

Update:

if disregard[0] not in each:

works but how can I compare it to all the contents in my list? using disregard only doesnt work Here is my input text file :

http://1234ashajkhdajkhdajkhdjkaaaaaaad1
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
genp://1234ashajkhdajkhdajkhdjkaaaaaaad1
a\a

The only thing that will append will be "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"

I didn't find anything wrong in your code. Try change each not in disregard to all([word not in each for word in disregard]) because I think when you split words, "http" not stand itself but like "http://blablabla.com" because there's no space there and it makes each not in disregard return True. — fahadh4ilyas
– fahadh4ilyas, Commented May 23, 2018 at 3:44
TypeError: 'str' object it no callable when I tried replacing it. — Cheers
– Cheers, Commented May 23, 2018 at 4:36
Ah... it's because you are using all as variable in for all in glob.glob(r'MYDIR'). Better change it because all is python function. — fahadh4ilyas
– fahadh4ilyas, Commented May 23, 2018 at 4:39
You could add an example of data, the output you currently get from it, and the output you want from it. This would make your question clearer. — zezollo
– zezollo, Commented May 23, 2018 at 4:57

Andrei Damian-Fekete · Accepted Answer · 2018-05-23 07:28:25Z

1

I think the answers should depend on the number of words you want to disregard. It's important to define what word means. If the word ends with spaces, should they all be stripped? One solution could be to create a regular expression from all your words and use that to match the line.

import glob
import re

disregard = ["http","gen"]
pattern = "|".join([re.escape(w) for w in disregard])
for all in glob.glob(r'MYDIR/*'):
    with open(all, "r", encoding="utf-16") as f:
        matched_words = []
        for line in f:
            line = line.rstrip("\n")
            if len(line) == 40 and not re.match(pattern, line):
                matched_words.append(line)

    print(matched_words)

edited May 23, 2018 at 7:28

answered May 23, 2018 at 5:29

Andrei Damian-Fekete

2,02225 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

user2100826 Over a year ago

This solution has an edge case where the last line in the file (which won't end with \n) could be 41 characters and erroneously pass these conditions. Or it could be 40 characters and fail.

End genocide - save Gaza · Accepted Answer · 2021-03-24 15:21:10Z

0

The basic structure looks ok, it seems the place where it's breaking is setting up incorrect conditionals. You say you want to check where each line starts with the supplied strings, but then you split each line and check for existence of those strings. Use .startswith() instead. This will also make it so there doesn't have to be a space after "http" in order for that string to be caught.

Also, either the conditional testing should be placed after the loop that builds the words list, or else the words list should be reset at the start of each loop so you're not re-testing words you've already checked.

# adjusted some variable names for clarity
words = []
output = []
disregard = ["http","gen"]

for fname in glob.glob(r'MYDIR'):
    with open(fname, "r", encoding="utf-16") as f:
        text = f.read()
    lines = text.split("\n")

    for line in lines:
        words += line.split()

for word in words:
    if len(word) == 40 and not any([word.startswith(dis) for dis in disregard]):
        output.append(each)

edited Mar 24, 2021 at 15:21

End genocide - save Gaza

25k10 gold badges113 silver badges133 bronze badges

answered May 23, 2018 at 5:38

user2100826

3672 gold badges3 silver badges15 bronze badges

2 Comments

Cheers Over a year ago

What if I want to store all the the string that starts with disregard?

user2100826 Over a year ago

Just use any(...) instead of not any(...).

Collectives™ on Stack Overflow

Append string based on condition python

2 Answers 2

1 Comment

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related