1

I hope I can ask this question in a way that makes some sense as the data itself is not good. However, as it was written in the 1950s, I have no option but to try to make the best of it in trying to bring it into the current century for those of us who need it. Also, Benchmark speed is not an issue as this is run manually and only rarely, and always on my local PC rather than on the live server. The resulting table is then uploaded.

To outline, it is looking in the Models field in parts_listing to see any with the ordinal numbers 22ND, 23RD etc. If there are any, then it looks in parts_modelno to find all that begin with the cardinal number 22, 23 etc. It takes all that match and populates parts_parsed with the rows in the select statement. It is doing that but it is getting all from parts_modelno, not just those that match.

For the first query below, it works well but needs multiple copies, one for each of the ordinal and cardinal values of 22ND, 23RD, 24TH, 25TH, 26TH, 54TH and 22, 23, 24, 25, 26, 54.

The Models field being queried in parts_listing contains data that varies considerably and another query has already extracted the bulk of it. For example, 2250-51-52-55-70-71 is a shortcut for 2250, 2251, 2252, 2255, 2270, 2271 and there are others such as these below with each example being in its own row. Those have already been parsed into parts_parsed. Those with BODY (and similar ones with MODEL) are not being parsed properly but that's another issue and not important here.

2662-92; 5462-92
ALL 22ND; 2301-02-13-32
LHD, 2401-02-13; 2501-02-13-31; 2601-02-11; 5400-01-02-11
2201 (BODY 2293)

and quite a few other variations. These codes have been parsed out in an earlier operation and are already in the parsed format other than Models in parts_listing which is the original and from which this is trying to find the ordinals so that the data can be completed with those values. Without this, if there is an ALL 22ND or one or more of the others in the data, none of those entries currently appear and that's what I am trying to fix by running an additional query to do so.

INSERT INTO parts_parsed (pageNo, baseGroup, partID, partNo, modelNo) 
SELECT PageNo AS pageNo, BaseGroup AS baseGroup, pl.ID AS partID, PartNo AS partNo, chassisNo AS modelNo 
FROM parts_listing pl, parts_modelno pm 
WHERE Models LIKE '%22ND%' 
AND chassisNo LIKE '22%' 
AND BaseGroup NOT IN (SELECT GroupNo FROM parts_reftype WHERE BodyChassis = 1);

Following is what I tried in order to simplify it but it gives an entry for every instance of 22ND, 23RD, 24TH, 25TH, 26TH, 54TH and 22, 23, 24, 25, 26, 54 rather than just those matching 22ND and 22 and I understand why but I am unsure what to do about it.

INSERT INTO parts_parsed (pageNo, baseGroup, partID, partNo, modelNo) 
SELECT PageNo AS pageNo, BaseGroup AS baseGroup, pl.ID AS partID, PartNo AS partNo, chassisNo AS modelNo 
FROM parts_listing pl, parts_modelno pm
WHERE Models REGEXP '22ND|23RD|24TH|25TH|26TH|54TH'
AND chassisNo REGEXP '22|23|24|25|26|54'
AND BaseGroup NOT IN (SELECT GroupNo FROM parts_reftype WHERE BodyChassis = 1);

The structure for parts_parsed is below, followed by a small sample of the data (there are over 450k rows).

CREATE TABLE IF NOT EXISTS `parts_parsed` (
`ID` int unsigned NOT NULL AUTO_INCREMENT,
`pageNo` int unsigned NOT NULL,
`baseGroup` int unsigned NOT NULL,
`partID` int unsigned NOT NULL,
`partNo` varchar(20) NOT NULL,
`modelNo` smallint unsigned DEFAULT NULL,
`bodyNo` smallint unsigned DEFAULT NULL,
`isRHD` tinyint unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`ID`)
) ENGINE=INNODB AUTO_INCREMENT=1 DEFAULT CHARSET=UTF8MB4 COLLATE=UTF8MB4_GENERAL_CI;

ID  pageNo  baseGroup   partID  partNo  modelNo bodyNo  isRHD
1   1   0   1   391906  2201        0
2   1   0   1   391906  2202        0
3   1   0   1   391906  2211        0
4   1   0   1   391906  2220        0
5   1   0   1   391906  2222        0
6   1   0   1   391906  2232        0
7   1   0   1   391906  2240        0
8   1   0   1   391906  2301        0
9   1   0   2   391907  2306        0
10  1   0   2   391907  2326        0

The structure of parts_modelno is below, followed by a small sample of data.

DROP TABLE IF EXISTS `parts_modelno`;
CREATE TABLE IF NOT EXISTS `parts_modelno` (
  `ID` smallint unsigned NOT NULL AUTO_INCREMENT,
  `seriesYear` varchar(8) DEFAULT NULL,
  `bodyNo` varchar(8) DEFAULT NULL,
  `chassisNo` varchar(8) DEFAULT NULL,
  `engineNo` varchar(8) DEFAULT NULL,
  `modelDesc` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`ID`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=UTF8MB4_GENERAL_CI;;

ID  seriesYear  bodyNo  chassisNo
1   1948-49 2213    2213
2   1948-49 2250    2226
3   1948-49 2251    2226
4   1948-49 2252    2206
5   1948-49 2255    2206
26  1949-50 2365    2301
27  1949-50 2372    2302
28  1949-50 2375    2302
29  1949-50 2379    2332
30  1949-50 2382    2302

And finally as requested, here is the structure of parts_listing followed by a very short sample selected at random.

DROP TABLE IF EXISTS `parts_listing`;
CREATE TABLE IF NOT EXISTS `parts_listing` (
  `ID` int unsigned NOT NULL AUTO_INCREMENT,
  `BaseGroup` int unsigned DEFAULT NULL,
  `GroupNumber` varchar(20) DEFAULT NULL,
  `BaseName` varchar(50) DEFAULT NULL,
  `GroupName` varchar(50) DEFAULT NULL,
  `Name` varchar(100) DEFAULT NULL,
  `PartNo` varchar(30) DEFAULT NULL,
  `Models` varchar(255) DEFAULT NULL,
  `Description` varchar(255) DEFAULT NULL,
  `Quantity` smallint unsigned DEFAULT NULL,
  `PageNo` int DEFAULT NULL,
  `SubPage` varchar(5) DEFAULT NULL,
  `RevDate` int DEFAULT NULL,
  `Edition` varchar(50) DEFAULT NULL,
  PRIMARY KEY (`ID`)
) ENGINE=INNODB AUTO_INCREMENT=1 DEFAULT CHARSET=UTF8MB4 COLLATE=UTF8MB4_GENERAL_CI;

ID  BaseGroup   GroupNumber BaseName    GroupName   Name    PartNo  Models  Description Quantity    PageNo  SubPage RevDate Edition
13570   30  30.46796    BODY    DOORS-DOOR REAR TRIM PANEL ASSY "SPECIFY TRIM SET NO"   453674  5450-51 RIGHT, SET 50-52    1   402     -490665600  48-54
1850    3   3.236   CLUTCH AND TRANSMISSION TRANSMISSION    FLANGE-DRIVING SHAFT UNIVERSAL JOINT    302868  2213-20-22-26; 2313     1   58      -490665600  48-54
16314   30  30.874934   BODY    RADIATOR GRILLE AND SPLASHER    BRACKET-RAD SIDE SPLASHER BRACE FRAME   G120370 2469; 2579; ALL 26TH; 54TH  NUT 7/16-20 2   469     -490665600  48-54
14633   30  30.666983   BODY    ELECTRICAL-ROOF LIGHT   GROMMET-ROOF LIGHT CABLE    403834  2280-86     1   423     -504921600  48-54
12273   30  30.34395    BODY    DOORS-DOOR FRONT    MOULDING-TRIM PANEL 442793  2677-79-97; 5467-97 INTERMEDIATE, LOWER, RIGHT  1   369     -490665600  48-54\

Additional examples of Models can be found here.

20
  • even with reg ex you must combie evers combination of model and chasis and join them by or Commented Sep 6, 2021 at 0:38
  • The data sample for table parts_parsed only filled into 7 out of 8 columns, which column is empty? Commented Sep 6, 2021 at 0:39
  • I'm assuming that you're trying to populate parts_parsed tables with data from other table but it's incorrect or something? Or are you trying to get an output out of the parts_parsed table? Commented Sep 6, 2021 at 0:48
  • I'll try to clarify in the original post but to answer here, it is looking in the Models field in parts_listing to see any with the ordinal numbers 22ND, 23RD etc. If there are any, then it looks in parts_modelno to find all that begin with the cardinal number 22, 23 etc. It takes all that match and populates parts_parsed with the rows in the select statement. It is doing that but it is getting all from parts_modelno, not just those that match. Commented Sep 6, 2021 at 1:01
  • 1
    Agreed. The test data wasn't in a usable form, so I mainly had to assume @DonP's assumptions were valid. It's a risk. But without explicit INSERTs (unambiguous data) to cover the cases of interest, we're left to guess. Commented Sep 6, 2021 at 3:27

2 Answers 2

0

There are 3 slightly different versions of the solution here, for various versions of MySQL and MariaDB.

MariaDB 10.5 is compatible with all 3 solutions. MySQL 8.0 is compatible with 2, and MySQL 5.7 is compatible with just one of the following.

The pairs derived table or WITH clause term provides the pairs of patterns ('25TH', '25'), etc.

Once we have that, it's just a matter of joining with that list in your original SELECT query expression, used to generate the rows to be inserted.

-- MariaDB 10.5, MySQL 8.0, and MySQL 5.7

INSERT INTO parts_parsed (pageNo, baseGroup, partID, partNo, modelNo) 
SELECT PageNo AS pageNo, BaseGroup AS baseGroup
     , pl.ID AS partID, PartNo AS partNo
     , chassisNo AS modelNo 
  FROM parts_listing pl
     , parts_modelno pm
     , (
         SELECT '22ND' AS p1, '^22' AS p2   UNION
         SELECT '23RD', '^23'               UNION
         SELECT '24TH', '^24'               UNION
         SELECT '25TH', '^25'               UNION
         SELECT '26TH', '^26'               UNION
         SELECT '54TH', '^54'               -- etc
       ) AS pairs
 WHERE Models REGEXP p1
   AND chassisNo REGEXP p2
   AND BaseGroup NOT IN (SELECT GroupNo FROM parts_reftype WHERE BodyChassis = 1)
;

Test case for MySQL 5.7, 8.0 and MariaDB 10.5

-- MariaDB 10.5, and MySQL 8.0 

INSERT INTO parts_parsed (pageNo, baseGroup, partID, partNo, modelNo) 
WITH pairs (p1, p2) AS (
         SELECT '22ND' AS p1, '^22' AS p2   UNION
         SELECT '23RD', '^23'               UNION
         SELECT '24TH', '^24'               UNION
         SELECT '25TH', '^25'               UNION
         SELECT '26TH', '^26'               UNION
         SELECT '54TH', '^54'               -- etc
     )
SELECT PageNo AS pageNo, BaseGroup AS baseGroup
     , pl.ID AS partID, PartNo AS partNo
     , chassisNo AS modelNo 
  FROM parts_listing pl
     , parts_modelno pm
     , pairs
 WHERE Models REGEXP p1
   AND chassisNo REGEXP p2
   AND BaseGroup NOT IN (SELECT GroupNo FROM parts_reftype WHERE BodyChassis = 1)
;

Test case for MySQL 8.0 and MariaDB 10.5 (updated)

-- For MariaDB 10.5:

INSERT INTO parts_parsed (pageNo, baseGroup, partID, partNo, modelNo) 
WITH pairs (p1, p2) AS (
         SELECT * FROM (
                         VALUES ('22ND', '^22')
                              , ('23RD', '^23')
                              , ('24TH', '^24')
                              , ('25TH', '^25')
                              , ('26TH', '^26')
                              , ('54TH', '^54')
                       ) AS x
     )
SELECT PageNo AS pageNo, BaseGroup AS baseGroup
     , pl.ID AS partID, PartNo AS partNo
     , chassisNo AS modelNo 
  FROM parts_listing pl
     , parts_modelno pm
     , pairs
 WHERE Models    REGEXP p1
   AND chassisNo REGEXP p2
   AND BaseGroup NOT IN (SELECT GroupNo FROM parts_reftype WHERE BodyChassis = 1)
;

Test case for MariaDB 10.5

MySQL 8.0 (and 5.7) had problems with the table value constructor FROM (VALUES (), (), ()) AS x. We replaced the table value constructor with a simple UNION list for MySQL 8.0 and 5.7. MySQL 5.7 does not support the WITH clause, so we replaced that with a derived table. The 5.7 version works for all 3 versions (MariaDB 10.5, MySQL 5.7 and 8.0).

Hopefully, I didn't insert any typos while trying to provide the above detail.

I noticed continuing conversation about generating patterns, which wasn't the focus on this question. If you have a question covering this, feel free to mention it. Here's something I worked up a few days ago, while thinking about your general problem. This just focuses on a few kinds of expressions you used that are intended to generate patterns. This shows how some of them might be handled in one expression. The data is self contained in the following query:

WITH RECURSIVE seq (n) AS (
            SELECT 1
             UNION ALL
            SELECT n + 1 FROM seq WHERE n <= 9
     )
   , args (arg) AS (
         SELECT '2213-20-22-26; 2313; 22ND; 23RD; 24TH; 25TH; 26TH; 54TH' AS arg UNION
         SELECT '2210-21-23; 2311; 22ND; 29TH; 51ST'
     )
   , norm (term, arg, n) AS (
            SELECT TRIM(REPLACE(TRIM(LEADING SUBSTRING_INDEX(t1.arg,';',seq.n-1) FROM SUBSTRING_INDEX(t1.arg,';',seq.n)), ';','')) AS term
                 , t1.arg
                 , n
              FROM args AS t1
              JOIN seq
                ON seq.n > 0 AND SUBSTRING_INDEX(t1.arg,';',seq.n-1) <> SUBSTRING_INDEX(t1.arg,';',seq.n)
             ORDER BY seq.n
     )
   , pattern1 (term, arg, n, pat) AS (
            SELECT t.term, t.arg, t.n
                 , CASE WHEN LENGTH(term) = 4
                        THEN
                             CASE WHEN SUBSTRING(term, -1) BETWEEN '0' AND '9'
                                  THEN CONCAT('^', term, '$')
                                  ELSE CONCAT('^', SUBSTRING(term, 1, 2))
                              END
                    END AS pat
             FROM norm AS t
     )
   , norm2 (term, arg, n, pat) AS (
            SELECT t1.term
                 , t1.arg
                 , seq.n
                 , CONCAT('^', TRIM(REPLACE(TRIM(LEADING SUBSTRING_INDEX(t1.term,'-',seq.n-1) FROM SUBSTRING_INDEX(t1.term,'-',seq.n)), '-','')), '$') AS tag
              FROM pattern1 AS t1
              JOIN seq
                ON seq.n = 1
               AND t1.pat IS NULL
             UNION ALL
            SELECT t1.term
                 , t1.arg
                 , seq.n
                 , CONCAT('^', SUBSTRING(t1.term, 1, 2), TRIM(REPLACE(TRIM(LEADING SUBSTRING_INDEX(t1.term,'-',seq.n-1) FROM SUBSTRING_INDEX(t1.term,'-',seq.n)), '-','')), '$') AS tag
              FROM pattern1 AS t1
              JOIN seq
                ON seq.n > 1 AND SUBSTRING_INDEX(t1.term,'-',seq.n-1) <> SUBSTRING_INDEX(t1.term,'-',seq.n)
               AND t1.pat IS NULL
             UNION ALL
            SELECT t1.*
              FROM pattern1 AS t1
             WHERE t1.pat IS NOT NULL
     )
SELECT *
  FROM norm2
 ORDER BY arg, term, n
;

Result containing the generated patterns:

+---------------+---------------------------------------------------------+------+--------+
| term          | arg                                                     | n    | pat    |
+---------------+---------------------------------------------------------+------+--------+
| 2210-21-23    | 2210-21-23; 2311; 22ND; 29TH; 51ST                      |    1 | ^2210$ |
| 2210-21-23    | 2210-21-23; 2311; 22ND; 29TH; 51ST                      |    2 | ^2221$ |
| 2210-21-23    | 2210-21-23; 2311; 22ND; 29TH; 51ST                      |    3 | ^2223$ |
| 22ND          | 2210-21-23; 2311; 22ND; 29TH; 51ST                      |    3 | ^22    |
| 2311          | 2210-21-23; 2311; 22ND; 29TH; 51ST                      |    2 | ^2311$ |
| 29TH          | 2210-21-23; 2311; 22ND; 29TH; 51ST                      |    4 | ^29    |
| 51ST          | 2210-21-23; 2311; 22ND; 29TH; 51ST                      |    5 | ^51    |
| 2213-20-22-26 | 2213-20-22-26; 2313; 22ND; 23RD; 24TH; 25TH; 26TH; 54TH |    1 | ^2213$ |
| 2213-20-22-26 | 2213-20-22-26; 2313; 22ND; 23RD; 24TH; 25TH; 26TH; 54TH |    2 | ^2220$ |
| 2213-20-22-26 | 2213-20-22-26; 2313; 22ND; 23RD; 24TH; 25TH; 26TH; 54TH |    3 | ^2222$ |
| 2213-20-22-26 | 2213-20-22-26; 2313; 22ND; 23RD; 24TH; 25TH; 26TH; 54TH |    4 | ^2226$ |
| 22ND          | 2213-20-22-26; 2313; 22ND; 23RD; 24TH; 25TH; 26TH; 54TH |    3 | ^22    |
| 2313          | 2213-20-22-26; 2313; 22ND; 23RD; 24TH; 25TH; 26TH; 54TH |    2 | ^2313$ |
| 23RD          | 2213-20-22-26; 2313; 22ND; 23RD; 24TH; 25TH; 26TH; 54TH |    4 | ^23    |
| 24TH          | 2213-20-22-26; 2313; 22ND; 23RD; 24TH; 25TH; 26TH; 54TH |    5 | ^24    |
| 25TH          | 2213-20-22-26; 2313; 22ND; 23RD; 24TH; 25TH; 26TH; 54TH |    6 | ^25    |
| 26TH          | 2213-20-22-26; 2313; 22ND; 23RD; 24TH; 25TH; 26TH; 54TH |    7 | ^26    |
| 54TH          | 2213-20-22-26; 2313; 22ND; 23RD; 24TH; 25TH; 26TH; 54TH |    8 | ^54    |
+---------------+---------------------------------------------------------+------+--------+
Sign up to request clarification or add additional context in comments.

23 Comments

That looks very promising but how does it fetch the 23RD, 24TH etc? Also, it gives a syntax error on line 3.
MySQL apparently updated itself as I see it's running version 8 but the live server is 7.
You'll need to use a derived table for MySQL 5.7. This was just the first two pairs in the list. You can add the rest similarly. I can show you that too. I'll first provide a MySQL 5.7 version, so that you don't have the syntax issues of the unsupported WITH clause.
@DonP I added a version for MySQL 5.7, which should be fine with later versions and MariaDB. I added another UNION term to provide one more pair in your list, as another example, with a link to the updated fiddle.
No need for MySQL 5.7 as I'm not running that. I'll see if I can figure out how to add pairs but running it as it is it very quickly generated 48,810 rows of data which seems like a lot for only 22ND!
|
0

I go with a different approach, I believe it's achievable with a short query but with layers of functions. I'll try to explain it one by one and hopefully I don't miss anything.

The idea I had was very simple, to separate the values in Models column from the table parts_listing first then present it as each rows and lastly match the first two digits of the extracted result with table parts_modelno. So, here is the query:

INSERT INTO parts_parsed (pageNo, baseGroup, partID, partNo, modelNo) 
/*Part 1*/
WITH RECURSIVE seq AS(
    SELECT 1 sn UNION ALL
    SELECT sn+1 FROM seq WHERE sn+1 <= 100)
/*Part 3*/
SELECT PageNo AS pageNo, BaseGroup , pl.ID AS partID, PartNo, chassisNo AS modelNo 
   FROM
  (/*Part 2*/ 
     SELECT *,
        REGEXP_REPLACE(
            SUBSTRING_INDEX(
                 SUBSTRING_INDEX(Models,';',sn),';',-1),'[a-zA-Z]','') sval
       FROM seq JOIN parts_listing pl
         ON sn <= (LENGTH(Models)-LENGTH(REPLACE(Models,';','')))+1) pl
       JOIN parts_modelno pm 
         ON (LEFT(pl.sval,2)=LEFT(pm.bodyNo,2) OR LEFT(pl.sval,2)=LEFT(pm.chassisNo,2))
GROUP BY PageNo, BaseGroup, partID, PartNo, modelNo
ORDER BY PageNo, BaseGroup, partID, PartNo, modelNo ;

As you can see, there's a lot going on there. I have remarked them with 3 parts and that's how I'll attempt to explain them.

Part 1:
Create a sequence number using WITH RECURSIVE .. function; In MariaDB there's a very convenient way to create a sequence number by using their sequence engine function in which you can just write SELECT seq FROM seq_1_to_100 and you'll get rows of numbering sequence from 1 to 100. Anyway, there are two usage of the generated numbering sequence here, one is to be used in the count part of SUBSTRING_INDEX(str,delim,count) and the other is to be used as the ON filter in JOIN.

Part 2:

REGEXP_REPLACE(
            SUBSTRING_INDEX(
                 SUBSTRING_INDEX(Models,';',sn),';',-1),'[^0-9]','') sval
  1. Assuming that the data in models are all separated with ;, we're going to use it as the delimiter in the SUBSTRING_INDEX(str,delim,count) function.
  2. Using REGEXP_REPLACE to replace all non-numerical characters and return only numbers.

The ON part was also a bit of work as I attempt to get how many values nested in the Models that needed to be separated and return as a row of it's own. Therefore the basic idea is to use LENGTH() of characters on the original Models value then subtract it with the LENGTH() of characters on Models whereby the delimiter (;) is being removed. Now, (;) is just a single character so when we remove that and subtract it's length against the original value, we'll get 1; which actually represents that there are two values separated by it. That is why there's a +1 at the end of the character length subtraction.

ON sn <= (LENGTH(Models)-LENGTH(REPLACE(Models,';','')))+1) pl

Part 3:
Making the query above as sub-query then do a JOIN with parts_modelno table. The ON condition will match between the first two digits of value that was extracted from Models column earlier. This I believe is a straight forward operation.

Demo fiddle

21 Comments

Thank you! I’m traveling today back home to California from my Oregon shop but once I get back and get my PC there synced with the work done here, I’ll give it a try. Yes, the data chunks are separated by ; but there are also items such as RHD, (with the comma) or (EXCEPT BODY 2313) OR (BODIES 2313-33) which, for this work, are being ignored. These use the same structure as the rest and I wanted to mention them only in case the other characters might be an issue. It would be great to be able to filter for the bodies or chassis being included or excluded but for now, that is not too important.
Yes, I'm very curious to know if the query will work for you. I'm also interested in the other type of data you mentioned but it's probably easy for you to filter if you know what you're filtering beforehand and the values are consistent.
I am too but couldn’t copy and paste on an iPad from the fiddle to try it on my own database against real data so will have to wait until I have a PC available.
Yeah it's ok. It's best when you can run on proper machine for optimal tests. Besides the post will still be here unless it's being deleted, right. We can always comeback to comment or do editing in the future.
On a PC finally and ran the query which seems to run reasonable quickly. I have not yet tested for accuracy other than visually but it appears it is to replace the query answered here: stackoverflow.com/questions/57844393/… rather than the one in this question. If that's the case, it's missing a few things. One is the RHD vs LHD. Some have RHD, meaning that the part works only on Left Hand or Right Hand drive vehicles. Also, not in either question, two or three BaseGroups need the bodyNo populated rather than the modelNo. BaseGroup 29, 30 and 31.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.