5

I have a text field which looks like:

option[A]sum[A]g3et[B]

I want to get the text which is inside the [ ] without duplicates. Meaning to get:

A
B

There can't be a case of double like [ [ ] ].

I know this is a horrible way to save data in databases. I can not change how the data is saved. I just need to get a very specific (one time) information from this column.

I tried to do:

SELECT substring_index(substring_index(sentence, '[', -1),']', 1)
FROM (SELECT 'THIS[A] IS A TEST' AS sentence) temp;

This gives me A, but it will not work for many [].

I thought of using regex however I don't know how many [ ] I have.

How do I do that?

8
  • 1
    You have already asked this question today; stackoverflow.com/questions/37026194/… Commented May 4, 2016 at 11:55
  • @jarlh page not found, deleted I think. Commented May 4, 2016 at 11:56
  • @jarlh I deleted and re post to make it clearer. Commented May 4, 2016 at 11:56
  • Do you have a list of possible values inside the braces? Otherwise, this is rather difficult in MySQL, which doesn't even support table-valued functions. Commented May 4, 2016 at 11:57
  • @GordonLinoff values inside the [] are letters and numbers. Commented May 4, 2016 at 11:59

1 Answer 1

6

It is not job for DB but it is possible:

CREATE TABLE tab(id INT, col VARCHAR(100));           
INSERT INTO tab(id, col) 
VALUES (1, 'option[A]sum[A]g3et[B]'), (2, '[Cosi]sum[A]g3et[ZZZZ]');      

SELECT DISTINCT *
FROM (
  SELECT id, RIGHT(val, LENGTH(val) - LOCATE('[', val)) AS val
  FROM
  (
    SELECT id, SUBSTRING_INDEX(SUBSTRING_INDEX(t.col, ']', n.n), ']', -1) AS val
    FROM tab t 
    CROSS JOIN 
    (
     SELECT a.N + b.N * 10 + 1 n
       FROM 
      (SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) a
      ,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) b
    ) n
    WHERE n.n <= 1 + (LENGTH(t.col) - LENGTH(REPLACE(t.col, ']', '')))
  ) sub
) s
WHERE val <> ''
ORDER BY ID;

SqlFiddleDemo

Note:

Depending on col maximum length you may need to generate more numbers in CROSS JOIN section. For now it is up to 100.

Output:

enter image description here

How it works:

  1. Generate number table with CROSS JOIN
  2. Split string based on ] as delimeter
  3. RIGHT(val, LENGTH(val) - LOCATE('[', val)) remove the part up to [
  4. filter out empty records
  5. Get only DISTINCT values

Inner most query:

╔════╦══════════╗
║ id ║   val    ║
╠════╬══════════╣
║  1 ║ option[A ║
║  1 ║ sum[A    ║
║  1 ║ g3et[B   ║
║  1 ║          ║
╚════╩══════════╝

Second subquery:

╔════╦═════╗
║ id ║ val ║
╠════╬═════╣
║  1 ║ A   ║
║  1 ║ A   ║
║  1 ║ B   ║
║  1 ║     ║
╚════╩═════╝

And outermost query:

╔════╦═════╗
║ id ║ val ║
╠════╬═════╣
║  1 ║ A   ║
║  1 ║ B   ║
╚════╩═════╝

I need the result of query per row.. not combined

So add simple:

WHERE n.n <= 1 + (LENGTH(t.col) - LENGTH(REPLACE(t.col, ']', '')))
  AND t.id = ?

EDIT 2:

see http://sqlfiddle.com/#!9/8ee95/1 your query works partially for my data. I also changed the type to longtext.

You want to parse JSON in MySQL. As I said before parse and get value in application layer. This answer is only for demo/toys purpose and will have very low performamce.

If you still insist on SQL solution:

SELECT id, val,s.n
FROM (
  SELECT id, RIGHT(val, LENGTH(val) - LOCATE('[', val)) AS val,n
  FROM
  (
    SELECT id, SUBSTRING_INDEX(SUBSTRING_INDEX(t.col, ']', n.n), ']', -1) AS val, n.n
    FROM (SELECT id, REPLACE(col, '[]','') as col FROM tab) t
    CROSS JOIN 
    (
     SELECT e.N * 10000 + d.N * 1000 + c.N * 100 + a.N + b.N * 10 + 1 n
       FROM 
      (SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) a
      ,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) b
      ,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) c
      ,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) d
      ,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) e

    ) n
    WHERE n.n <= 1 + (LENGTH(t.col) - LENGTH(REPLACE(t.col, ']', '')))
  ) sub
) s
WHERE val <> ''
GROUP BY id, val
HAVING n <> MAX(n)
ORDER BY id,n;

SqlFiddleDemo

Output:

╔═════╦═════════════╦════╗
║ id  ║    val      ║ n  ║
╠═════╬═════════════╬════╣
║  1  ║ CE31285LV4  ║  1 ║
║  1  ║ D32E        ║  3 ║
║  1  ║ GTX750      ║  5 ║
║  1  ║ M256S       ║  7 ║
║  1  ║ H2X1T       ║  9 ║
║  1  ║ FMLANE4U4   ║ 11 ║
╚═════╩═════════════╩════╝

EDIT 3:

What exactly is done there? Why do you need n

CROSS JOIN and entire subquery is only tally table. That'is all. If MySQL has function to generate number sequence (like generate_series or prepopulated number table there is no need for CROSS JOIN.

Number table is needed for SUBSTRING_INDEX:

SUBSTRING_INDEX(str,delim,count)

Returns the substring from string str before count occurrences of the delimiter delim. If count is positive, everything to the left of the final delimiter (counting from the left) is returned. If count is negative, everything to the right of the final delimiter (counting from the right) is returned. SUBSTRING_INDEX() performs a case-sensitive match when searching for delim.

Sign up to request clarification or add additional context in comments.

4 Comments

thank you. Can you explain how it works? I think you expanded my initiate intention. I need it to work on a specific string. Not on wide table... If it have 5 relevant rows in the table I will run the query 5 times. one time for each row.. I need the result of query per row.. not combined. The string datatype is longtext
see sqlfiddle.com/#!9/8ee95/1 your query works partially for my data. I also changed the type to longtext.
@mm Check updated. And for next time, please show your real data first, not oversimplified example :)
Thank you. Can you explain the cross join? What exactly is done there? Why do you need n ? and why it runs from 1 to 1000? does it means this support up to 1000 rows in result? If I read it correctly the cross join only create a sequence number series?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.