I am stuck in a situation where I have a field in mysql which is a long html field. I need to extract the words between html tags.
Say,
<!DOCTYPE html><html><body><h1>My First Heading</h1><p>My first paragraph.</p></body></html>
I need something like this.
"My First Heading My first paragraph"
I am currently doing it in java from an exported csv file using a function like that of the following:
public String getStringFromHtml(String html) {
String nohtml = html.toString().replaceAll("<[^>]*>"," ");
return nohtml.trim().replaceAll("\\s+", " ");
}
But lets assume that I am just using Mysql workbench (no server side scripts) for some data analysis.
I was still wondering whether there are any ways that Mysql allow us to eliminate the html tags and extract the words in between. I tried to search all over stack overflow & google, I was not lucky as there is only advice to use it in PHP or java or stored procedures.
Still there is no way to extract html text using SQL ?