1

I couldn't figure out how to perform line.startswith("substring") for a set of substrings, so I tried a few variations on the code at bottom: since I have the luxury of known 4-character beginning substrings, but I'm pretty sure I've got the syntax wrong, since this doesn't reject any lines.

(Context: my aim is to throw out header lines when reading in a file. Header lines start with a limited set of strings, but I can't just check for the substring anywhere, because a valid content line may include a keyword later in the string.)

cleanLines = []
line = "sample input here"
if not line[0:3] in ["node", "path", "Path"]:  #skip standard headers
    cleanLines.append(line)
4
  • 2
    The ending index in string slicing is exclusive. You want line[0:4] or simply line[:4] Commented Nov 6, 2015 at 18:56
  • Annnnd that was all it took. Fixed. If you put that as an answer, I'll pick it at once. Commented Nov 6, 2015 at 18:58
  • If you do know how to do it using a length-insensitive startswith(), I'd be super grateful. I hate brittle hacks. Commented Nov 6, 2015 at 19:00
  • .startswith() calls the "beginning substrings" prefixes. Commented Dec 15, 2020 at 12:59

1 Answer 1

2

Your problem stems from the fact that string slicing is exclusive of the stop index:

In [7]: line = '0123456789'

In [8]: line[0:3]
Out[8]: '012'

In [9]: line[0:4]
Out[9]: '0123'

In [10]: line[:3]
Out[10]: '012'

In [11]: line[:4]
Out[11]: '0123'

Slicing a string between i and j returns the substring starting at i, and ending at (but not including) j.

Just to make your code run faster, you might want to test membership in sets, instead of in lists:

cleanLines = []
line = "sample input here"
blacklist = set(["node", "path", "Path"])
if line[:4] not in blacklist:  #skip standard headers
    cleanLines.append(line)

Now, what you're actually doing with that code is a startswith, which is not restricted by any length parameters:

In [12]: line = '0123456789'

In [13]: line.startswith('0')
Out[13]: True

In [14]: line.startswith('0123')
Out[14]: True

In [15]: line.startswith('03')
Out[15]: False

So you could do this to exclude headers:

cleanLines = []
line = "sample input here"
headers = ["node", "path", "Path"]
if not any(line.startswith(header) for header in headers) :  #skip standard headers
    cleanLines.append(line)
Sign up to request clarification or add additional context in comments.

2 Comments

It's also possible to use any(map(line.startswith, headers)) which I find cool.
Both the above and that in the answer are not necessary. startswith can also take a tuple of options, so it can be if not line.startswith(tuple(headers)):

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.