0

I have a table with HTML content in which I need to select all rows containing a span element with the CSS class "glyphicon glyphicon-film" with some text content, but exclude all rows without content or just containing a white space.

These rows should NOT match:

<span class="glyphicon glyphicon-film">&nbsp;</span>
<span class="glyphicon glyphicon-film"></span>
<span class="glyphicon glyphicon-film"> </span>
<span class="hypo-link-target" data-link-target-id="1379">Hammartå</span>

These rows should match:

<span class="glyphicon glyphicon-film">nån text</span>
<span class="glyphicon glyphicon-film">Nån Text</span>
<span class="glyphicon glyphicon-film">NÅN TEXT</span>
<span class="glyphicon glyphicon-film">&nbsp;nån text</span>
<span class="glyphicon glyphicon-film">&nbsp;Nån Text</span>
<span class="glyphicon glyphicon-film">&nbsp; nån text</span>
<span class="glyphicon glyphicon-film">&nbsp; Nån Text</span>
<span class="glyphicon glyphicon-film"> nån text</span>
<span class="glyphicon glyphicon-film"> Nån Text</span>

The query I have at the moment:

select * from Section 
where (Text regexp '(span class="glyphicon glyphicon-film">&nbsp;.+</span>)' 
or Text regexp 'span class="glyphicon glyphicon-film">[a-zA-Z][|å|ä|ö|Å|Ä|Ö]+</span>');

Which regexp should I use to get the desired result?

UPDATE: Based on suggestions in the answer I ended up with this query which I think works for all cases. This query will also match on rows with two or more span elements where some elements are correct and some are not, like this one:

<span class="glyphicon glyphicon-film">&nbsp;Nån Text</span>more content here... and a correct span <span class="glyphicon glyphicon-film">&nbsp;</span>

select * from Section 
where Text regexp 'span class="glyphicon glyphicon-film">&nbsp;.+</span>' 
        or (Text regexp 'span class="glyphicon glyphicon-film">[a-zA-ZåäöÅÄÖ ]+</span>'
and Text not regexp 'span class="glyphicon glyphicon-film"> </span>');

1 Answer 1

1

The pattern [a-zA-Z][|å|ä|ö|Å|Ä|Ö] matches a string beginning with an ASCII letter, followed by any mixture of |åäöÅÄÖ. You want to match a string of any combination of these characters, as well as spaces, which you left out. That would be [a-zA-ZåäöÅÄÖ ]+.

| is not used inside [] to specify alternatives, since [] is already specifying alternative characters.

select * from Section 
where (Text regexp 'span class="glyphicon glyphicon-film">&nbsp;.+</span>' 
        or Text regexp 'span class="glyphicon glyphicon-film">[a-zA-ZåäöÅÄÖ ]+</span>')
and Text not regexp 'span class="glyphicon glyphicon-film"> +</span>';

The last condition is needed to prevent <span class="glyphicon glyphicon-film"> </span> from matching.

DEMO

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks for the tip but it didn't work. With your change I only got match on the rows containing "&nbsp;xxx", nothing else.
"space" is still missing from the regexp.
@poa I've updated the answer and added a demo showing it works.
@Barmar ahh yes this was a lot better. With a slight modification I think it works fine now. I updated the question with the query I think is good. Thanks.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.