3

I've noticed that MySql has an extensive search capacity, allowing both wildcards and regular expressions. However, I'm in somewhat in a bind since I'm trying to extract multiple values from a single string in my select query.

For example, if I had the text "<span>Test</span> this <span>query</span>", perhaps using regular expressions I could find and extract values "Test" or "query", but in my case, I have potentially n such strings to extract. And since I can't define n columns in my select statement, that means I'm stuck.

Is there anyway I could have a list of values (ideally separated by commas) of any text contained with span tags?

In other words, if I ran this query, I would get "Test,query" as the value of spanlist:

select <insert logic here> as spanlist from HtmlPages ...
6
  • Thought : Return several records for each HtmlPages record, not several columns (and certainly not a comma delimitered list unless you REALLY have/need to) (in other words, a normalised result set). Commented Jun 27, 2011 at 12:00
  • I could, but I have reasons for making it a comma-separated list. The resulting query will be exported into a csv file and eventually into an excel document. Though lacking that, having multiple records, one for each span content, would be better than nothing. Commented Jun 27, 2011 at 12:05
  • Possible duplicate: stackoverflow.com/questions/4021507/… Commented Jun 27, 2011 at 12:24
  • With all due respect, I don't think this is a duplicate because it is the solution to finding two words, not n words. Commented Jun 27, 2011 at 12:31
  • To be fair to the OP, just because using SQL for splitting a string is not ideal, one should not assume that using another language in another tier is ideal either. This leads to a least-worst-choice, and to make that choice the OP needs to know the SQL solution, it's costs and benefits. Commented Jun 27, 2011 at 13:29

4 Answers 4

1

MySQL does have a regular expression engine, but it is implemented in the WHERE clause, not as part of the query field. It therefore cannot be used to split a string.

I would suggest that you'll need to use another language for this. Query the data using an application and split it as required there, or write a small program to do it - it'd probably be just a few lines in PHP.

If you really want to do it in MySQL, you could probably write a function to do it -- see this answer for some leads: Can Mysql Split a column? -- but frankly, it's not what MySQL (or SQL in general) is designed to do. You'd be much better off querying the data normally and splitting it into bits in another language.

( I mention PHP as it's what MediaWiki is written with, so you should be able to run PHP code without installing anything new, but pretty much any language will do).

Sign up to request clarification or add additional context in comments.

Comments

1

First of all - your data structure is miserable. One of the most common database rules is "Every row-and-column intersection contains exactly one value from the applicable domain (and nothing else)." It's one of the elementary rules when developing data structure.

As you see there is no possibility to write static select statement returning variable count of columns. The best approach if you do not want to change structure is to get text as is and post-process it in your favorite programming language.

3 Comments

I agree, but the question is still a good question, perhaps with different data
I used html as an example, though in reality, I'm trying to extract categories from pages in a mediawiki. They're all formatted [[Category:Category name]]. They're usually together, but would be nice to be able to provide a comma-separated list to my boss rather than clump of wiki code that is probably accurate.
SQL is a language for manipulating relational datasets. This is not relational dataset - that's all. Best way probably will be export tags and give them proper format using some calc or excel.
1

I'm surprised no one suggested EXTRACTVALUE(xml,'xpath') I think it does exactly what you are asking for, only with a little bit of trickery to get that delimiter in there (default "delimiter" is a space).

SET @xml = '<html><span>Test</span> this <span>query</span>
    <span>etc</span><div><span>etc etc</span></div></html>';

SELECT 
    LEFT(spanlist,LENGTH(spanlist)-1) AS spanlist
FROM
    (SELECT 
        EXTRACTVALUE(REPLACE(@xml,'</span>',',</span>'),'//span') AS spanlist
    ) AS T
;

Produces

+---------------------------+
| spanlist                  |
+---------------------------+
| Test, query, etc, etc etc |
+---------------------------+

Sorry this answer may be too late for you but hope it helps the next person

1 Comment

+1. Albeit a bit late, that is still an excellent answer and I appreciate the effort. Unfortunately it doesn't serve me now, but I'll remember this next time something like this comes up.
-1

Unfortunately the only solution available was to use a third party text editor such as Ultraedit and remove the tags using regular expressions. It seems it's not possible to use MySql.

Though this is not an idealic solution. If anyone ever stumbles across this question with a solution that will allow me to use MySQL to extract this information multiple lines or otherwise, I'd be more than happy to hear it.

Lacking a better solution, this will have to answer my question for now.

1 Comment

Ultraedit is not a optimal solution and your question is tagged as MYSQL and this solution is posted as comment so please remove.My suggestion is to use while loops .

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.