I have got a table in Hive, in which one of the columns is string. The values in that column are like "x=1,y=2,z=3". I need to write a query that adds the value of x in this column for all the rows. How do I extract the value of x and add them?
1 Answer
you would need a UDF for this transformation:
import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;
class SplitColumn extends UDF {
public Integer evaluate(Text input) {
if(input == null) return null;
String val=input.toString().split("=")[1];
return Integer.parseInt(val);
}
}
Now you can try this:
hive> ADD JAR target/hive-extensions-1.0-SNAPSHOT-jar-with-dependencies.jar;
hive> CREATE TEMPORARY FUNCTION SplitColumn as 'com.example.SplitColumn';
hive> select sum(SplitColumn(mycolumnName)) from mytable;
P.S: I have not tested this. But this should give a direction for you to proceed.