2

I have a table with a varchar(50) column = name. I have uploaded values from a local csv file such that the table looks like below. There are no errors/warnings on the import and I have imported other csv files of the same format (Windows Comma Separated) without having this issue.

***************
ID * columnName
***************
1  * any
2  * thing
3  * helpful

When I run:

SELECT * FROM myDB.tableName;

I see the table as shown above. However, when I run:

SELECT * FROM myDB.tableName WHERE columnName = "any";

I get no rows returned. If I then overwrite the csv loaded value in the table by:

UPDATE myDB.tableName SET columnName='any' WHERE ID= 1;

and then run the same query, then the row is returned as expected. So, at this point, I have two questions:

  1. How can I prevent the csv uploading values that are not recognized as strings?

  2. How can I bulk update all of the currently loaded values in columnName to be recognized as strings (I can't do individual updates as shown above, since there are too many rows affected)?

2
  • 1
    What does the CSV look like? What are you using to load the values? It may be that the columns are padded with spaces or some other odd thing. Have you tried a query like WHERE columnName like 'any%' ? Commented Mar 11, 2016 at 22:56
  • yes, when i query like 'any%' it does return the expected rows so it does appear that the import is adding chars to the end... Commented Mar 11, 2016 at 23:05

2 Answers 2

2

If the .csv file is from Windows, the file may use CRLF as the line delimiter.

And if the LOAD DATA specifies LINES TERMINATED BY '\n' you might be picking up the CR character as part of the last column.

It's also possible you are picking up trailing spaces.

That's really just a guess.

If that's the case, you might need your LOAD DATA to specify the CRLF as the line terminator, and you may also want to run that last field through a TRIM function.

My LOAD DATA from .csv file created on Windows would look something like this (excerpted, not complete):

LOAD DATA ...
... 
LINES TERMINATED BY '\r\n'
...
( id
, @fld2
)
SET columnName = TRIM(@fld2)

To debug what is currently stored in the column from your load, you could use the HEX function. (That's the closest thing I've found in MySQL to an Oracle-style DUMP() function.)

With the latin1 characterset, the CR character is shown as x'0D'. A space is x'20' and a tab character is x'09'.

SELECT HEX('abc'), HEX('abc \t\r')

HEX('abc')   HEX('abc \t\r')  
----------   -----------------
61 62 63     61 62 63 20 09 0D

So, to check for what's stored, you could run something like this:

SELECT columnName, HEX(columnName) 
  FROM mytable
 WHERE id = 1

Based on that, you can make appropriate adjustments to the LOAD DATA statement.

Using the technique of loading the field into a user-defined variable (as shown in my example LOAD DATA, loading the field contents to @fld2, you can use a SET clause to assign an expression to a column. The expression could make use of any number of builtin MySQL functions. For example, to remove tab characters from the string

  SET columnName = REPLACE(@fld2,'\t','')
Sign up to request clarification or add additional context in comments.

1 Comment

I was picking up extra characters and this helped to identify and fix the problem. Thanks!
0

I agree with @bitfiddler that it looks like your data includes whitespace or non-printable characters. If you can't clean the data as it's added executing

UPDATE myDB.tableName SET columnName=TRIM(columnName) 

will do a bulk update of the data in place, but might take a while if the dataset is large.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.