1

My goal is selecting values using a WHERE clause on HANA SQL.

My_Table contains a column PartnerID of type NVARCHAR that stores an integer number.

The following works fine:

SELECT * FROM "My_Schema"."My_Table" WHERE 10 < TO_INTEGER("PartnerID");

giving the expected result.

However

SELECT * FROM "My_Schema"."My_Table" WHERE 10 > TO_INTEGER("PartnerID");

throws the following error:

JDBC: [339]: invalid number: SQL error

How is it possible that < works but the > operator fails?

2
  • So My_Table is a column store table? If it is a view, then for sure lots of other things could happen under the hood Commented Dec 10, 2024 at 18:33
  • The OP didn't include this information - but column store has been the default table type for a long time and I was able to reproduce the behaviour fairly easily with a column store table. Commented Dec 10, 2024 at 23:00

2 Answers 2

1

This seemingly inconsistent behaviour is the result of how the HANA column store works.

Strictly speaking, the result should be the same for both cases, but database systems are also just leaky abstractions on files sometimes.

Now the mechanics of this behaviour is as follows: In the column store the data for each column, e.g. your "PartnerID" is stored in its own set of data structures.

At first (right after a record has been created or updated) the information is held in what is called the "delta store" which is pretty much storing the column values as-is. This allows for relatively quick inserts/updates. After a while, HANA will take the data from the "delta store" and fit it into the actual column data structures ("main store").

The most important characteristic of the column "main store" is that all values of a column are stored in a dictionary and assigned an internal value-ID. (the neat thing is that the size of this value-ID can change according to how many different values there are in a column. Fewer distinct values require fewer bits as keys --> less memory required --> faster scan speed)

The actual use of a specific value is then marked in the column-value-vector: every table row corresponds to a specific point in this vector and the value-ID at this point determines which value is present in the row. So far, so easy. (In case this isn't easy/obvious/simple, there are plenty of really good materials available that explain how column stores and specifically the HANA one work.)

Now, when HANA has to find a rows that match a condition in the WHERE clause, it needs to find out which value-IDs too look for in the value-vector. For something like 10 < TO_INTEGER("PartnerID") it needs to look into the dictionary and find entries larger than 10. I haven't mentioned it before, but the column dictionary is always sorted. This allows for binary search strategies.

So, HANA can readily go and probe the dictionary until it finds the entry '10' and converts it to 10. Due to the sorted dictionary it can now just take any value larger than that, convert it and return the values.

This works, as long as all the values from '10' upwards can be correctly converted to integers.

For the opposite case, imagine the values below 10 are not convertible to integers. In this case, HANA will again find number 10 in the dictionary and now probes values lower than that and will try to convert those values. This fails obviously.

So, this is the simple case for when this can happen. If the delta store contains entries than cannot be converted, this will also lead to the conversion error - in that case for both statements, however.

There had been plenty of discussions in the past about how to gracefully handle such conversion issues. Search terms like HANA + IS_NUMERIC should yield the corresponding web pages.

As the original question was "how could this happen in HANA?" I leave this answer at this point.

Sign up to request clarification or add additional context in comments.

3 Comments

"Strictly speaking, the result should be the same for both cases" - why? Who says that an expression always needs to be evaluated?
Also I am not convinced that 10 < TO_INTEGER(...) can be pushed down to a string dict, after all '2' > '100' - might be wrong though
Nobody asked for the expression to be always evaluated. But data independence in DBMS should hide/abstract effects of how the data is stored from the application. Regarding the pushdown to the string dict - yeah, I thought about that too, but that's nothing I can reasonably test. Obviously, this behaviour depends on what data is in the column.
0

A conversion error only occurs if a conversion is actually performed for a certain value and the corresponding row is not filtered out before. Even if there is seemingly just a single filter, the following might happen:

  • A certain row is visible to one query but not the other due to transactional consistency
  • The query is aborted before the row is reached e.g. due to a LIMIT (this is implicitly set in some clients) or the client only fetched a part of the result
  • The Optimizer could pull some tricks on the filter and rewrite the filter so that rows are filtered before the expression is evaluated

TLDR: You need to deal with the invalid values in "PartnerID" and remove them from the table or adapt your query to deal with them - there is no guarantee whatsoever that the expression will not be evaluated some day for one of the rows with invalid values.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.