7

I'm working with a large set of legacy data (converted from a flat-file db), where a field is formatted as the last 2 digits of the year the record was entered, followed by a 4 digit increment...

e.g., the third record created in 1998 would be "980003", and the eleventh record created in 2004 would be "040011".

i can not change these values - they exist through their company, are registered with the state, clients, etc. I know it'd be great to separate out the year and the rest of it into separate columns, but that's not possible. i can't even really do it "internally" since each row has about 300 fields that are all sortable, and they're very used to working with this field as a record identifier.

so i'm trying to implement a MySQL UDF (for the first time) to sort. The query executes successfully, and it allows me to "select whatever from table order by custom_sort(whatever)", but the order is not what i'd expect.

Here's what I'm using:

DELIMITER //

CREATE FUNCTION custom_sort(id VARCHAR(8))
    RETURNS INT
    READS SQL DATA
    DETERMINISTIC
    BEGIN
        DECLARE year VARCHAR(2);
        DECLARE balance VARCHAR(6);
        DECLARE stringValue VARCHAR(8);
        SET year = SUBSTRING(0, 2,  id);
        SET balance = SUBSTRING(2, 6, id);
        IF(year <= 96) THEN
            SET stringValue = CONCAT('20', year, balance);
        ELSE
            SET stringValue = CONCAT('19', year, balance);
        END IF;
        RETURN CAST(stringValue as UNSIGNED);
    END//

The records only go back to 96 (thus the arbitrary "if first 2 characters are less than 96, prepend '20' otherwise prepend '19'). I'm not thrilled with this bit, but don't believe that's where the core problem is.

To throw another wrench in the works, it turns out that 1996 and 1997 are both 5 digits, following the same pattern described above but instead of a 4 digit increment, it's a 3 digit increment. Again, I suspect this will be a problem, but is not the core problem.

An example of the returns I'm getting with this custom_sort:

001471
051047
080628
040285
110877
020867
090744
001537
051111
080692
040349
110941
020931
090808
001603
051175

I really have no idea what I'm doing here and have never used MySQL for a UDF like this - any help would be appreciated.

TYIA

/EDIT typo

/EDIT 2 concat needed "year" value added - still getting same results

3
  • How does MySQL knows that you are casting to an UNSIGNED INT, and not for example UNSIGNED TINYINT? Does it know, or it assumes some default numeric type? Commented Feb 18, 2012 at 7:28
  • Isn't it possible to append a new column to the db with the fixed values? Anyway, for the different number length problem, you could use the LPAD function: dev.mysql.com/doc/refman/5.1/en/… Commented Feb 18, 2012 at 8:03
  • @biziclop no - in the UI the users have access to all the columns and can edit them, sort each, etc... admins can even add columns. i have to keep the structure as is. each column is expressed in the UI as a column that can be clicked, dragged, etc Commented Feb 18, 2012 at 8:07

1 Answer 1

5

You have some problems with your substrings, and the cast to int at the end makes it sort values with more digits at the end, not by year. This should work better;

DELIMITER //

CREATE FUNCTION custom_sort(id VARCHAR(8))
    RETURNS VARCHAR(10)
    READS SQL DATA
    DETERMINISTIC
    BEGIN
        DECLARE year VARCHAR(2);
        DECLARE balance VARCHAR(6);
        DECLARE stringValue VARCHAR(10);
        SET year = SUBSTRING(id, 1, 2);
        SET balance = SUBSTRING(id, 3, 6);
        IF(year <= 96) THEN
            SET stringValue = CONCAT('20', year, balance);
        ELSE
            SET stringValue = CONCAT('19', year, balance);
        END IF;
        RETURN stringValue;
    END//

DELIMITER ;

This can be simplified a bit to;

DELIMITER //

CREATE FUNCTION custom_sort(id VARCHAR(8))
    RETURNS varchar(10)
    DETERMINISTIC
    BEGIN
        IF(SUBSTRING(id, 1, 2) <= '96') THEN
            RETURN CONCAT('20', id);
        ELSE
            RETURN CONCAT('19', id);
        END IF;
    END//

DELIMITER ;
Sign up to request clarification or add additional context in comments.

6 Comments

much closer! but i think the 5 digit vs 6 digit thing is a problem (which is why i was trying to cast to INT)... if i sort ASC i get 97001, 97002, 97003, if i sort DESC i get 96323, 96322, 96321. the "first" record should probably be 96something, and the "last" should be 12something...
One liner: CONCAT( IF(SUBSTRING(id, 1, 2)<='96','20','19'),id)
@biziclop tried both variants, still getting 97001 when sorting by ASC, despite records existing with ids like 120001, etc... any ideas?
@BigMoMo 96 turns into 2096 according to the if... if you want 96 to be 1996, the "if" should be <96, not <=96.
@BigMoMo To make it clear, the if currently says that if "year" is less than or equal to 96, it is sorted as 20xx, if it's greater than 96 it's sorted as 19xx. So 1997 coming before 2096 is correct sort order, no?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.