1

I have a long text, and some list of dict objects which has indexes of this long text. I want to add some strings to these indexes. If I set a loop, indexes change and I must calculate the indexes again. I think this way very confusing. Is there any way add different strings to different indexes in single time?

My sample data:

main_str = 'Lorem Ipsum is simply dummy text of the printing and typesetting industry.'

My indexes list:

indexes_list = [
    {
      "type": "first_type",
      "endOffset": 5,
      "startOffset": 0,
    },
    {
      "type": "second_type",
      "endOffset": 22,
      "startOffset": 16,
    }
]

My main purpose: I want to add <span> attributes to given indexes with some color styles based on types. After that I render it on template, directly. Have you another suggestion?

For example I want to create this data according to above variables main_str and indexes_list(Please ignore color part of styles. I provide it dynamically from value of type from indexes_list):

new_str = '<span style="color:#FFFFFF">Lorem</span> Ipsum is <span style="color:#FFFFFF">simply</span> dummy text of the printing and typesetting industry.'

3 Answers 3

1

Create a new str to avoid change the main_str:

main_str = 'Lorem Ipsum is simply dummy text of the printing and typesetting industry.'
indexes_list = [
    {
      "type": "first_type",
      "startOffset": 0,
      "endOffset": 5,
    },
    {
      "type": "second_type",
      "startOffset": 16,
      "endOffset": 22,
    }
]

new_str = ""
index = 0
for i in indexes_list:
    start = i["startOffset"]
    end = i["endOffset"]
    new_str += main_str[index: start] + "<span>" + main_str[start:end] + "</span>"
    index = end
new_str += main_str[index:]
print(new_str)
Sign up to request clarification or add additional context in comments.

1 Comment

Your solution works correctly. Thanks your answer. Actually I search whether it is possible single time instead of a loop.
1

Here is a solution without any imperative for loops. It still uses plenty of looping for the list comprehensions.

# Get all the indices and label them as starts or ends.
starts = [(o['startOffset'], True) for o in indexes_list]
ends = [(o['endOffset'], False) for o in indexes_list]

# Sort everything...
all_indices = sorted(starts + ends)

# ...so it is possible zip together adjacent pairs and extract substrings.
pieces = [
    (s[1], main_str[s[0]:e[0]])
    for s, e in zip(all_indices, all_indices[1:])
]

# And then join all the pieces together with a bit of conditional formatting.
formatted = ''.join([
    f"<span>{part}</span>" if is_start else part
    for is_start, part in pieces
])

formatted
# '<span>Lorem</span> Ipsum is s<span>imply </span>dummy text of the printing and typesetting industry.'

Also, although you said you do not want for loops, it is important to note that you do not have to do any index modification if you do the updates in reverse order.

def update_str(s, spans): 
    for lookup in sorted(spans, reverse=True, key=lambda o: o['startOffset']): 
        start = lookup['startOffset'] 
        end = lookup['endOffset'] 
        before, span, after = s[:start], s[start:end], s[end:] 
        s = f'{before}<span>{span}</span>{after}' 
    return s 

update_str(main_str, indexes_list)                                                                                                                                                                                                   
# '<span>Lorem</span> Ipsum is s<span>imply </span>dummy text of the printing and typesetting industry.'

4 Comments

Thanks your notes and answer, I updated the indexes. Actually I need a new string which is added new strings to related indexes. I don't need to dict object.
Could you provide the output you are expecting for your example data?
I've added which data fromat I want as output.
Okay, I have implemented this without any loops for you. Or at least, without any procedural for loops. All of the list comprehensions are still technically loops.
-1

The unvisited insertion indices won't change if you iterate backwards. This is true for all such problems. It sometimes even lets you modify sequences during iteration if you're careful (not that I'd ever recommend it).

You can find all insertion points from the dict, sort them backwards, and then do the insertion. For example:

items = ['<span ...>', '</span>']
keys = ['startOffset', 'endOffset']
insertion_points = [(d[key], item) for d in indexes_list for key, item in zip(keys, items)]
insertion_points.sort(reverse=True)

for index, content in insertion_points:
    main_str = main_str[:index] + content + main_str[index:]

The reason not to do that is that it's inefficient. For reasonable sized text that's not a huge problem, but keep in mind that you are chopping up and reallocating an ever increasing string with each step.

A much more efficient approach would be to chop up the entire string once at all the insertion points. Adding list elements at the right places with the right content would be much cheaper that way, and you would only have to rejoin the whole thing once:

items = ['<span ...>', '</span>']
keys = ['startOffset', 'endOffset']
insertion_points = [(d[key], item) for d in indexes_list for key, item in zip(keys, items)]
insertion_points.sort()

last = 0
chopped_str = []
for index, content in insertion_points:
    chopped_str.append(main_str[last:index])
    chopped_str.append(content)
    last = index
chopped_str.append[main_str[last:]]
main_str = ''.join(chopped_str)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.