Find a string and extract values from result of hive query using shell script?

Question

Question Similar to : Find and Extract value after specific String from a file using bash shell script?

I am executing a hive query from shell script and need to extract some value in a variable , query is as below :

sql="show create table dev.emp"
partition_col= `beeline -u $Beeline_URL -e $sql` | grep 'PARTITIONED BY' | cut -d "'" -f2`

output of sql query is below :

+----------------------------------------------------+
|                   createtab_stmt                   |
+----------------------------------------------------+
| CREATE EXTERNAL TABLE `dv.par_kst`(                |
|   `col1` string,                                   |
|   `col2` string,                                  |
|   `col3` string)                                  |
| PARTITIONED BY (                                   |
|   `part_col1` int,                                 |
|   `part_col2` int)                                 |
| ROW FORMAT SERDE                                   |
|   'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'  |
| STORED AS INPUTFORMAT                              |
|   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'  |
| OUTPUTFORMAT                                       |
|   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' |
| LOCATION                                           |
|   'hdfs://nameservicets1/dv/hdfsdata/par_kst' |
| TBLPROPERTIES (                                    |
|   'spark.sql.create.version'='2.2 or prior',       |
|   'spark.sql.sources.schema.numPartCols'='2',      |
|   'spark.sql.sources.schema.numParts'='1',         |
|   'spark.sql.sources.schema.part.0'='{"type":"struct","fields":[{"name":"col1","type":"string","nullable":true,"metadata":{}},{"name":"col2","type":"string","nullable":true,"metadata":{}},{"name":"col3","type":"integer","nullable":true,"metadata":{}},{"name":"part_col2","type":"integer","nullable":true,"metadata":{}}]}',  |
|   'spark.sql.sources.schema.partCol.0'='part_col1', |
|   'spark.sql.sources.schema.partCol.1'='part_col2', |
|   'transient_lastDdlTime'='1587487456')            |
+----------------------------------------------------+

from above sql, I want to extract PARTITIONED BY details.

Desired output :

part_col1 , part_col2

tried with below code but not getting correct value :

partition_col=`beeline -u $Beeline_URL -e $sql` | grep 'PARTITIONED BY' | cut -d "'" -f2`

and these PARTITIONED BY is not fixed , means for some other file it might contains 3 or more , so I want extract all the PARTITIONED BY.

All the values between PARTITIONED BY and ROW FORMAT SERDE , removing spaces "`" and data types!

Digvijay S · Accepted Answer · 2020-04-22 05:34:34Z

2

Using sed

sed -n  '/PARTITIONED BY/,/ROW FORMAT SERD/p' file.txt | sed  '1d; $d' |  sed  -E 's/.*(`.*`).*/\1/g' |  tr -d '`' | tr '\n' ','

Demo:

$sed -n  '/PARTITIONED BY/,/ROW FORMAT SERD/p' file.txt | sed  '1d; $d' |  sed  -E 's/.*(`.*`).*/\1/g' |  tr -d '`'  | tr '\n' ','
part_col1,part_col2,$
$

explanation :

sed -n '/PARTITIONED BY/,/ROW FORMAT SERD/p' <--- print line between 2 pattern

sed '1d; $d' <-- Delete first and last row

sed -E 's/.*(.*).*/\1/g' < -- print string between ```

tr -d ''` <-- Delete ``` char

tr '\n' ',' <-- replace new line with ,

answered Apr 22, 2020 at 5:34

Digvijay S

2,7151 gold badge11 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Harshit Kakkar Over a year ago

Thank you so much for detailed explanation , but I don't have those values in file.txt , but I am generating from a hive query , so a variable has those values instead of file.txt , so this will work for variable as well ?

Digvijay S Over a year ago

Yes. In your code put this where you have used grep

Digvijay S Over a year ago

partition_col=$(beeline -u $Beeline_URL -e $sql) | sed -n  '/PARTITIONED BY/,/ROW FORMAT SERD/p'  | sed  '1d; $d' |  sed  -E 's/.*(

.*).*/\1/g' | tr -d '' | tr '\n' ','`

Paul Evans · Accepted Answer · 2020-04-21 21:53:13Z

0

You could use awk:

/PARTITIONED BY \(/  {partitioned_by = 1; next}
/ROW FORMAT SERDE/  {partitioned_by = 0; next}
partitioned_by == 1 {a[n++] = substr($2, 2, length($2) - 2)}
END { for (i in a) printf "%s, ", i}

store the above in a file called beeline.awk and execute with:

partition_col=`beeline -u $Beeline_URL -e $sql` | awk -f beeline.awk

edited Apr 21, 2020 at 21:53

answered Apr 21, 2020 at 21:38

Paul Evans

27.7k3 gold badges41 silver badges56 bronze badges

2 Comments

Harshit Kakkar Over a year ago

Hello , yes I did the same but query result is comming as 0,1,

Harshit Kakkar Over a year ago

I tried saving the query result in a file and tried running below :/PARTI TIONED BY (/ {partitioned_by = 1; next} /ROW FORMAT SERDE/ {partitioned_by = 0; next}partitioned_by == 1 {a[n++] = substr($2, 2, length($2) - 2)} END { for (i in a) printf "%s, ", i} "file.txt"

Collectives™ on Stack Overflow

Find a string and extract values from result of hive query using shell script?

2 Answers 2

3 Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related