1

This should be a simple thing to do but I can't get it to work.

Say I have this string.

I want this string to be splitted into smaller strings.

And, well, I want to split it into smaller strings, but only take what is between a T and a S.

So, the result should yield

this, to be s, to s, trings

So far I've tried splitting on every S, and then up to every T (backwards). However, it will only get the first "this", and stop. How can I make it continue and get all the things that are between T's and S's?

(In this program I export the results to another text file)

matches = open('string.txt', 'r')

with open ('test.txt', 'a') as file:    
    for line in matches:
           test = line.split("S")
           file.write(test[0].split("T")[-1] + "\n")

matches.close()

Maybe using Regular Expressions would be better, though I don't know how to work with them too well?

5
  • @thefourtheye When finding the first S, it would go backwards looking for a T, when it finds the first T (aka this), it would forget about that part and keep on. After this it finds another S, but since it has already gone through what is before that S, it wouldn't care about it and simply not find a match until it gets to the first S in 'Splitted', goes back to find a T, etc. Mmmh maybe quite messy in my mind. :-) Commented Jan 6, 2014 at 14:27
  • 1
    @BrickTop: That makes no sense, really. Because there is a t in tring to be s, for example. Commented Jan 6, 2014 at 14:28
  • He obviously needs a computer to solve the task he cannot solve by hand without errors. I think that's okay. Commented Jan 6, 2014 at 14:29
  • Now that's my mind playing games. Indeed, Q was wrong, there was another T. I edited it now reflecting what it should actually show, sorry for misunderstanding. @Alfe , I'm not sure if your comment is set to be offensive, but I'm actually trying to do this with a 400K characters string. Commented Jan 6, 2014 at 14:36
  • No, no offense intended. Such "errors" in expected output derived from executing a wanted algorithm manually are quite common, actually. And you explained what your algorithm should do quite elaborately in your comment. (But of course Martijn's complaint on your imperfectness triggered my wording.) Commented Jan 6, 2014 at 14:40

1 Answer 1

3

You want a re.findall() call instead:

re.findall(r't[^s]*s', line, flags=re.I)

Demo:

>>> import re
>>> sample = 'I want this string to be splitted into smaller strings.'
>>> re.findall(r't[^s]*s', sample, flags=re.I)
['t this', 'tring to be s', 'tted into s', 'trings']

Note that this matches 't this' and 'tted into s'; your rules need clarification as to why those first t characters shouldn't match when 'trings' does.

It sounds as if you want to match only text between t and s without including any other t:

>>> re.findall(r't[^ts]*s', sample, flags=re.I)
['this', 'to be s', 'to s', 'trings']

where tring in the second result and tted in the 3rd are not included because there is a later t in those results.

Sign up to request clarification or add additional context in comments.

8 Comments

probably want t[^st]*s as the RE, just based on his example output. But otherwise a very good solution
Still, he expects this, tring to be s, ted into s, trings :(
@thefourtheye: which, according to his own description, is the wrong output since there are other t characters before and after the ones his samples use as first character.
Yup... Even I am quite confused
Sorry @thefourtheye, I was confused too - result of spending too many hours around. :-) My example was wrong, this is exactly what I wanted to achieve.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.