0

I'm trying to extract alpha numeric characters from strings that also have some info within square brackets.

Ex:

  • this is some sample text [first sentence]
  • [second sentence][important] some more sample text
  • [not important] this is sample as well

I want the output to be:

  • this is some sample text
  • some more sample text
  • this is sample as well

I tried using negative look ahead, extracting patterns before'[' (works only for a few cases).

3
  • 1
    How would you like sql to feature in this? Commented Aug 21, 2019 at 13:30
  • @CaiusJard I'm querying a set of string values logged in a table using presto. It uses the same regex functionality as javascript. Commented Aug 21, 2019 at 13:32
  • 2
    Instead of trying to extract stuff not in brackets can you run a regex replace to remove everything that is in brackets? Eg replace " *\[.*?\] *" with ""? Commented Aug 21, 2019 at 13:39

2 Answers 2

1

Per my comment, and after a quick look at the fine presto manual, could you:

SELECT regexp_replace('[second sentence][important] some more sample text', ' *\[.*?\] *');

Regex is any number of spaces, then sq bracket then any number of any char up to the next sq bracket, then sq bracket, then any number of spaces.

I dug the function out of the manual (no access to presto/never used), I presume by providing only two arguments it implicitly replaces matches with nothing

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you, that worked. However I used REGEXP_REPLACE(string_field, '\[.*?\]', '')
No problems - I had the extra space-asterisk in there to try and nuke the trailing and leading spaces around brackets too. You could also TRIM, or whatever presto does for removing white spaces from ends of strings
0

You could try to use a Regex for this, but I think that making your own function for this would work well.

function getText(bracketedText) {
  let idx = 0
  let newIdx = 0
  let str = ''
  while (newIdx !== -1) {
    newIdx = bracketedText.indexOf('[', idx)
    if (newIdx < 0) {
      str += bracketedText.slice(idx, bracketedText.length)
    } else {
      str += bracketedText.slice(idx, newIdx)
    }
    idx = bracketedText.indexOf(']', newIdx + 1) + 1
  }
  return str
}

This should be fairly efficient at stripping out anything in brackets.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.