0

Question Similar to : Find and Extract value after specific String from a file using bash shell script?

I am executing a hive query from shell script and need to extract some value in a variable , query is as below :

sql="show create table dev.emp"
partition_col= `beeline -u $Beeline_URL -e $sql` | grep 'PARTITIONED BY' | cut -d "'" -f2`

output of sql query is below :

+----------------------------------------------------+
|                   createtab_stmt                   |
+----------------------------------------------------+
| CREATE EXTERNAL TABLE `dv.par_kst`(                |
|   `col1` string,                                   |
|   `col2` string,                                  |
|   `col3` string)                                  |
| PARTITIONED BY (                                   |
|   `part_col1` int,                                 |
|   `part_col2` int)                                 |
| ROW FORMAT SERDE                                   |
|   'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'  |
| STORED AS INPUTFORMAT                              |
|   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'  |
| OUTPUTFORMAT                                       |
|   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' |
| LOCATION                                           |
|   'hdfs://nameservicets1/dv/hdfsdata/par_kst' |
| TBLPROPERTIES (                                    |
|   'spark.sql.create.version'='2.2 or prior',       |
|   'spark.sql.sources.schema.numPartCols'='2',      |
|   'spark.sql.sources.schema.numParts'='1',         |
|   'spark.sql.sources.schema.part.0'='{"type":"struct","fields":[{"name":"col1","type":"string","nullable":true,"metadata":{}},{"name":"col2","type":"string","nullable":true,"metadata":{}},{"name":"col3","type":"integer","nullable":true,"metadata":{}},{"name":"part_col2","type":"integer","nullable":true,"metadata":{}}]}',  |
|   'spark.sql.sources.schema.partCol.0'='part_col1', |
|   'spark.sql.sources.schema.partCol.1'='part_col2', |
|   'transient_lastDdlTime'='1587487456')            |
+----------------------------------------------------+

from above sql, I want to extract PARTITIONED BY details.

Desired output :

part_col1 , part_col2

tried with below code but not getting correct value :

partition_col=`beeline -u $Beeline_URL -e $sql` | grep 'PARTITIONED BY' | cut -d "'" -f2`

and these PARTITIONED BY is not fixed , means for some other file it might contains 3 or more , so I want extract all the PARTITIONED BY.

All the values between PARTITIONED BY and ROW FORMAT SERDE , removing spaces "`" and data types!

2 Answers 2

2

Using sed

sed -n  '/PARTITIONED BY/,/ROW FORMAT SERD/p' file.txt | sed  '1d; $d' |  sed  -E 's/.*(`.*`).*/\1/g' |  tr -d '`' | tr '\n' ','

Demo:

$sed -n  '/PARTITIONED BY/,/ROW FORMAT SERD/p' file.txt | sed  '1d; $d' |  sed  -E 's/.*(`.*`).*/\1/g' |  tr -d '`'  | tr '\n' ','
part_col1,part_col2,$
$

explanation :

sed -n '/PARTITIONED BY/,/ROW FORMAT SERD/p' <--- print line between 2 pattern

sed '1d; $d' <-- Delete first and last row

sed -E 's/.*(.*).*/\1/g' < -- print string between ```

tr -d ''` <-- Delete ``` char

tr '\n' ',' <-- replace new line with ,

Sign up to request clarification or add additional context in comments.

3 Comments

Thank you so much for detailed explanation , but I don't have those values in file.txt , but I am generating from a hive query , so a variable has those values instead of file.txt , so this will work for variable as well ?
Yes. In your code put this where you have used grep
partition_col=$(beeline -u $Beeline_URL -e $sql) | sed -n '/PARTITIONED BY/,/ROW FORMAT SERD/p' | sed '1d; $d' | sed -E 's/.*(.*).*/\1/g' | tr -d '' | tr '\n' ','`
0

You could use awk:

/PARTITIONED BY \(/  {partitioned_by = 1; next}
/ROW FORMAT SERDE/  {partitioned_by = 0; next}
partitioned_by == 1 {a[n++] = substr($2, 2, length($2) - 2)}
END { for (i in a) printf "%s, ", i}

store the above in a file called beeline.awk and execute with:

partition_col=`beeline -u $Beeline_URL -e $sql` | awk -f beeline.awk

2 Comments

Hello , yes I did the same but query result is comming as 0,1,
I tried saving the query result in a file and tried running below :/PARTI TIONED BY (/ {partitioned_by = 1; next} /ROW FORMAT SERDE/ {partitioned_by = 0; next}partitioned_by == 1 {a[n++] = substr($2, 2, length($2) - 2)} END { for (i in a) printf "%s, ", i} "file.txt"

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.