1

Good morning,

I am hoping to find assistance with writing a select query to remove some text from a column.

I have created a column called "TEXT_MINING" in a previous query, that some code a different developer wrote will perform some text mining analysis on. The TEXT_MINING column has text that looks like:

EMPLOYEE FOUND BROKEN HANDLE ON HAMMER * 02-08-18 15:19:22 PM * I found a hammer that had the wood split on the handle, tossed into scrap.

I want to remove the * and all of the text in between the two * to help my software engineer do some text mining. Here is my current dilemma:

Not only do I not know how to use REGEXP_REPLACE, but I can't get the REGEXP worked out. I currently have:

^[*]\w[*]$

So it looks like:

REGEXP_REPLACE(col, '^[*]\w[*]$', '')

Could anyone advise?

Thank you!

4
  • I fail to see why "text mining" cannot handle the original column, but that is an entirely different matter. Commented Feb 8, 2018 at 13:17
  • @GordonLinoff You are exactly correct there...but I am just trying to do as I am told Commented Feb 8, 2018 at 13:17
  • Can the string have multiple *s? Commented Feb 8, 2018 at 13:20
  • Never, at least not that I have EVER seen @GordonLinoff. However, I think the solutions below worked! Commented Feb 8, 2018 at 13:21

2 Answers 2

3

You may use this approach to remove 1+ occurrences of *...* substrings in your column:

SELECT REGEXP_REPLACE(
   'EMPLOYEE FOUND BROKEN HANDLE ON HAMMER * 02-08-18 15:19:22 PM * I found a hammer that had the wood split on the handle, tossed into scrap.', 
   '\s*\*[^*]*\*', 
   ''
) as Result from dual

See the online demo

Pattern details

  • \s* - 0+ whitespaces
  • \* - a * char
  • [^*]* - 0+ chars other than *
  • \* - a * char.

See the regex demo.

enter image description here

Sign up to request clarification or add additional context in comments.

4 Comments

Thank you very much for this and for your explanation! This is great!
This is a safer solution because it stops at the next * rather than the last one.
Wiktor, this is fantastic and I am going to move forward from this. I also appreciate you taking the time to break out the "pattern details" for me. I learn a lot on this and am appreciative of your time.
@bm0r3son Note that * is a special char (an operator, called a quantifier) denoting 0 or more occurrences of the pattern it modifies), thus you need to escape it to match as a literal * char (like "\*"), but if it is inside a character set, bracket expression, you do not have to escape it ("[*]").
2

This could be a way:

select regexp_replace(yourString, '\*.*\*', '') from yourTable

Please notice that this will remove everything between the first and the last '*' in the string; for example:

with test(x) as (
select 'Something * something else * and a * just before another * and something more' from dual
)
select regexp_replace(x, '\*.*\*', '') from test

gives:

Something  and something more

4 Comments

I employed this method and it worked perfectly. It is simple and effective. Would it be better to use Wiktor's regular expression above? This method worked amazing for me. Thank you!
@bm0r3son This \*.*\* will make Hi out of Hi *Tom*, have you met *Jim?* because .* matches up to the last * occurrence.
@bm0r3son: the two expressions do different things, it only depends on what best suits your need
Thank you so much!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.