Unix: Split a file into two based on matched string

Question

I want to split a file into two, but cannot find a way to do this.

Master.txt
Happy Birthday to you!  [[#HAPPY]]
Stop it.  [[#COMMAND]]
Make a U-turn. [[#COMMAND]]

I want to split into two files, with the 2nd file starting when it matches the regex pattern [[#

Output1.txt
Happy Birthday to you!
Stop it.
Make a U-turn.

Output2.txt
[[#HAPPY]]
[[#COMMAND]]
[[#COMMAND]]

I've tried using awk:

awk -v RS="[[#*" '{ print $0 > "temp" NR }'

but it doesn't give my desired output -- any help would be appreciated!

jaypal singh · Accepted Answer · 2014-03-14 02:16:50Z

4

Here is one way with GNU awk:

awk -v RS='\\[\\[#|\n' 'NR%2{print $0>"Output1.txt";next}{print "[[#"$0>"Output2.txt"}' master

Test:

$ ls
master

$ cat master 
Happy Birthday to you!  [[#HAPPY]]
Stop it.  [[#COMMAND]]
Make a U-turn. [[#COMMAND]]

$ awk -v RS='\\[\\[#|\n' 'NR%2{print $0>"Output1.txt";next}{print "[[#"$0>"Output2.txt"}' master

$ ls
master  Output1.txt  Output2.txt

$ head Out*
==> Output1.txt <==
Happy Birthday to you!  
Stop it.  
Make a U-turn. 

==> Output2.txt <==
[[#HAPPY]]
[[#COMMAND]]
[[#COMMAND]]

answered Mar 14, 2014 at 2:16

jaypal singh

77.6k24 gold badges108 silver badges147 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

anubhava Over a year ago

super duper, I was thinking on same line but you were too quick.

Juan Diego Godoy Robles Over a year ago

You can reduce the print calls ,see my answer. Cheers

chepner · Accepted Answer · 2014-03-14 03:25:54Z

1

A pure bash solution might be a little slower, but is very readable:

while read line; do
    [[ $line =~ (.*)(\[\[#.*]]) ]]
    printf "%s" "${BASH_REMATCH[1]}" >&3
    printf "%s" "${BASH_REMATCH[2]}" >&4
done 3> output1.txt 4> output2.txt

answered Mar 14, 2014 at 3:25

chepner

538k77 gold badges594 silver badges746 bronze badges

Comments

Ronak Patel · Accepted Answer · 2014-03-14 02:37:41Z

0

you can write small script like this…
#!/bin/ksh

sed -i -e 's/ \[\[#/,\[\[#/' $1

cut -d, -f1 $1 > $1.part1
cut -d, -f2 $1 > $1.part2    
---------------------------------------------

OR…use multi-command line

# sed -i -e 's/ \[\[#/,\[\[#/' Master.txt ; cut -d, -f1 Master.txt > output1.txt ; cut -d, -f1 Master.txt > output.txt

answered Mar 14, 2014 at 2:37

Ronak Patel

3,8591 gold badge18 silver badges30 bronze badges

1 Comment

NeronLeVelu Over a year ago

be carefull, you assume that , is not used in first part of text and use it to cut but it's maybe not the case in the full file.

Emmet · Accepted Answer · 2014-03-14 03:40:22Z

0

Simpler in sed, IMHO:

$ sed 's/^\([^[]*\).*/\1/' Master.txt > Output1.txt
$ sed 's/^[^[]*//'         Master.txt > Output2.txt

answered Mar 14, 2014 at 3:40

Emmet

6,45128 silver badges40 bronze badges

6 Comments

NeronLeVelu Over a year ago

idea is good but does not reply to specific request. it mention to separe string after [[# not [

Emmet Over a year ago

Same thing for the sample input file.

NeronLeVelu Over a year ago

this is a sample but you could have Happy day to you [and your wife]! [[#HAPPY]] that will fail. It is just "assume" that there is no [[# in first part of the line.

Emmet Over a year ago

If this case can occur, it should be treated in the sample input/output.

NeronLeVelu Over a year ago

so why does he mention [[# if [ is enough. Sample not always show every case possible. If constraint is reduce, your code works

|

NeronLeVelu · Accepted Answer · 2014-03-14 07:16:26Z

0

sed -n 's/\[\[#/\
&/;P
/\n/ {s/.*\n//;H;}
$ {x;s/\n//;w Output2.txt
  }' YourFile > Output1.txt

in 1 sed but awk is better suite for this task

answered Mar 14, 2014 at 7:16

NeronLeVelu

10.1k1 gold badge26 silver badges44 bronze badges

Comments

potong · Accepted Answer · 2014-03-14 08:35:40Z

0

This might work for you(GNU sed):

sed -n 's/\[\[#/\n&/;P;s/.*\n//w file3' file1 >file2

answered Mar 14, 2014 at 8:35

potong

59.3k6 gold badges55 silver badges92 bronze badges

Comments

BMW · Accepted Answer · 2014-03-14 08:58:07Z

0

No need for gnu awk, this should work for any awk

awk -F'\\[\\[#' '{print $1>"Output1.txt";print "[[#"$2>"Output2.txt"}' Master.txt

cat Output1.txt
Happy Birthday to you!
Stop it.
Make a U-turn.

cat Output2.txt
[[#HAPPY]]
[[#COMMAND]]
[[#COMMAND]]

edited Mar 14, 2014 at 8:58

BMW

45.6k13 gold badges105 silver badges123 bronze badges

answered Mar 14, 2014 at 6:10

Jotne

41.7k13 gold badges54 silver badges58 bronze badges

Collectives™ on Stack Overflow

Unix: Split a file into two based on matched string

7 Answers 7

Test:

2 Comments

Comments

1 Comment

6 Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

Test:

2 Comments

Comments

1 Comment

6 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related