Find and Extract value after specific String from a file using bash shell script?

Question

I have a file which contains below details : file.txt

+----------------------------------------------------+
|                   createtab_stmt                   |
+----------------------------------------------------+
| CREATE EXTERNAL TABLE `dv.par_kst`( |
|   `col1` string,                                   |
|   `col2` string,                                   |
|   `col3` int,                                      |
|   `col4` int,                                      |
|   `col5` string,                                   |
|   `col6` float,                                    |
|   `col7` int,                                      |
|   `col8` string,                                   |
|   `col9` string,                                   |
|   `col10` int,                                     |
|   `col11` int,                                     |
|   `col12` string,                                  |
|   `col13` float,                                   |
|   `col14` string,                                  |
|   `col15` string)                                  |
| PARTITIONED BY (                                   |
|   `part_col1` int,                                 |
|   `part_col2` int)                                 |
| ROW FORMAT SERDE                                   |
|   'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'  |
| STORED AS INPUTFORMAT                              |
|   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'  |
| OUTPUTFORMAT                                       |
|   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' |
| LOCATION                                           |
|   'hdfs://nameservicets1/dv/hdfsdata/par_kst' |
| TBLPROPERTIES (                                    |
|   'spark.sql.create.version'='2.2 or prior',       |
|   'spark.sql.sources.schema.numPartCols'='2',      |
|   'spark.sql.sources.schema.numParts'='1',         |
|   'spark.sql.sources.schema.part.0'='{"type":"struct","fields":[{"name":"col1","type":"string","nullable":true,"metadata":{}},{"name":"col2","type":"string","nullable":true,"metadata":{}},{"name":"col3","type":"integer","nullable":true,"metadata":{}},{"name":"col4","type":"integer","nullable":true,"metadata":{}},{"name":"col5","type":"string","nullable":true,"metadata":{}},{"name":"col6","type":"float","nullable":true,"metadata":{}},{"name":"col7","type":"integer","nullable":true,"metadata":{}},{"name":"col8","type":"string","nullable":true,"metadata":{}},{"name":"col9","type":"string","nullable":true,"metadata":{}},{"name":"col10","type":"integer","nullable":true,"metadata":{}},{"name":"col11","type":"integer","nullable":true,"metadata":{}},{"name":"col12","type":"string","nullable":true,"metadata":{}},{"name":"col13","type":"float","nullable":true,"metadata":{}},{"name":"col14","type":"string","nullable":true,"metadata":{}},{"name":"col15","type":"string","nullable":true,"metadata":{}},{"name":"part_col1","type":"integer","nullable":true,"metadata":{}},{"name":"part_col2","type":"integer","nullable":true,"metadata":{}}]}',  |
|   'spark.sql.sources.schema.partCol.0'='part_col1',  |
|   'spark.sql.sources.schema.partCol.1'='part_col2',  |
|   'transient_lastDdlTime'='1587487456')            |
+----------------------------------------------------+

from above file I want to extract PARTITIONED BY details.

Desired output :

part_col1 , part_col2

and these PARTITIONED BY is not fixed , means for some other file it might contains 3 or more , so I want extract all the PARTITIONED BY.

All the values between PARTITIONED BY and ROW FORMAT SERDE , removing spaces "`" and data types!

Could you please help me with this ?

Saboteur · Accepted Answer · 2020-04-21 22:15:39Z

1

sed -nr '/PARTITIONED BY/,/ROW FORMAT SERDE/p' a.txt|sed -nr '/`/p'|cut -d '`' -f 2|xargs -n 1 echo -n " "

answered Apr 21, 2020 at 22:15

Saboteur

1,4328 silver badges13 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

harsh Over a year ago

and also instead of having records in file.txt , I have to execute as below : par_col=beeline --silent -u "$BEELINE_URL" -e "$sql" where sql="show create table dvs_wk.par_kst" Par_col has the above result but when I doing like : result=sed -n '/PARTITIONED BY/,/ROW FORMAT SERDE/p' $par_col | sed -n '//p'|cut -d '' -f 2|xargs -n 1 echo -n " " It is giving me an Error.

Saboteur Over a year ago

sed prints all strings between PARTITIONED BY and ROW FORMAT SERDE (including them), then another sed prints strings only with "" character, than cut command split string in column by "" and prints second column (your number), then xargs grabs all numbers and print them with space as separator. May be not best pipeline, but it works on your example.

ssr1012 · Accepted Answer · 2020-04-22 06:14:50Z

1

my $text = do { local $/; <DATA> };

my @partitioned = ();

$text=~s#PARTITIONED BY\s*\(([^\(\)]*)\)# my $fulcontent=$1; 
push (@partitioned, $1) while($fulcontent=~m/\`([^\`]+)\`/g);
($fulcontent);
#egs;

print join "\, ", @partitioned;

Output:

part_col1, part_col2

answered Apr 22, 2020 at 6:14

ssr1012

2,5891 gold badge21 silver badges34 bronze badges

Comments

Walter A · Accepted Answer · 2020-04-22 18:52:21Z

1

When the layout of your result doesn't matter, you can ask sed to consider lines between a start and an end tag, and only print such a line when a field can be found between 2 backquotes.

sed -rn '/PARTITIONED BY/,/ROW FORMAT/s/.*`(.*)`.*/\1/p' file.txt

Combining the results in a line as desired can be done with

printf "%s , " $(sed -rn '/PARTITIONED BY/,/ROW FORMAT/s/.*`(.*)`.*/\1 /p' file.txt) |
   sed 's/ , $/\n/'

answered Apr 22, 2020 at 18:52

Walter A

20.2k2 gold badges29 silver badges46 bronze badges

Comments

Polar Bear · Accepted Answer · 2020-04-22 20:53:25Z

-1

Small perl script

read whole file into $data variable
select all between PARTITIONED BY (....)
select into array only elements between `
print result joined with ,

use strict;
use warnings;
use feature 'say';

my $data = do { local $/; <> };
my $re   = 'PARTITIONED BY \((.*?)\)';

$data =~ /$re/sg;

my @part = $1 =~ /`(.*?)`/sg;

say join ', ', @part;

answered Apr 22, 2020 at 20:53

Polar Bear

6,8061 gold badge8 silver badges13 bronze badges

Collectives™ on Stack Overflow

Find and Extract value after specific String from a file using bash shell script?

4 Answers 4

2 Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related