4

I have 135 documents stored as 135 lines (So each line is a long text) in File_A and I have 15 phrases in File_B. I need to extract a sentence and its before from File_A with a matching phrase in File_B. The extracted sentences from File_A-Line_1 should be output to a new file File_1. Similarly the extracted sentences from File_A-Line_2 should be output to a new file File_2 and so on till i extract matching sentences from all the lines. I did this with the following code

i=1
while read line; do
 while read row; do
   cat "$line" | sed 's/\./.\n/g' | grep -i -B 1 "$row"  | tr -d '\n' |  sed 's/--/\n/g'    >> file_$i
 done < $2 
 $i = $i+1;
done < $1

The problem here is, the output is being printed on to the console but not to the new file. Could some one help me in realizing my error.

Thank you

3
  • You mean your code actually works but prints the output to the console? I would have thought cat "$line" would fail straight away since $line is a line in a text and not a filename. Commented Oct 27, 2012 at 3:41
  • I tried your code, and all it does is output a series of errors the likes of cat: something something...: No such file and directory and command not found Commented Oct 27, 2012 at 3:51
  • This question is similar to: how to redirect a output of a command to two files Commented Nov 6 at 15:16

6 Answers 6

3

Is this clear? If not, comment on it, and I will edit it. Bash Output Redirection Example:

echo "some text" >file.txt;
#here we add on to the end of the file instead of overwriting the file
echo "some additional text" >>file.txt;
#put something in two files and output it
echo "two files and console" | tee file1.txt | tee file2.txt;
#put something in two files and output nothing
echo "just two files" | tee file1.txt >file2.txt;
Sign up to request clarification or add additional context in comments.

2 Comments

Are you able to append something to the end of two files similar to the last example? Using >> doesn't work, it only writes to the second file.
Nvm, found tee -a
2

tee actually accepts multiple file arguments, it is thus as simple as:

# from file
tee 1.txt 2.txt 3.txt <0.txt

# from string
tee 1.txt 2.txt 3.txt <<<'text'

# from heredoc
tee 1.txt 2.txt 3.txt <<'EOF'
line
line
line
EOF

# from pipeline
command | tee 1.txt 2.txt 3.txt

Comments

1

Fixing the previously-mentioned problems (re incrementing i and misuse of cat) leads to something like the following. Note, the line date > file_$i is there for debugging, to ensure each output file is new at the beginning of a test. The : operator is a no-op. The form <<< introduces a “here-doc”. If the content of $lines is a file name, instead of being a document as specified in the question, use <"$lines" in place of <<<"$lines".

#!/bin/bash
i=1
while read line; do
    date > file_$i
    while read row; do
    sed 's/\./.\n/g' <<< "$line" | grep -iB1 "$row" | tr -d '\n' |  sed 's/--/\n/g' >> file_$i
    done < $2 
    : $((i++))
done < $1

Given splitdoc.data containing the following:

This is doc 1.  I am 1 fine.  How are you, 1.? Ok. Hello 1.--  Go away now.
This is doc 2.  I am 2 fine.  How are you, 2.? Ok. Hello 2.--  Go away now.
This is doc 3.  I am 3 fine.  How are you, 3.? Ok. Hello 3.--  Go away now.
This is doc 4.  I am 4 fine.  How are you, 4.? Ok. Hello 4.--  Go away now. 

and splitdoc.tags with the following:

How are you
Go away now

Then the command

./splitdoc.sh splitdoc.data splitdoc.tags ; head file_*

produces:

==> file_1 <==
Fri Oct 26 19:42:00 MDT 2012
  I am 1 fine.  How are you, 1. Hello 1.
  Go away now.
==> file_2 <==
Fri Oct 26 19:42:00 MDT 2012
  I am 2 fine.  How are you, 2. Hello 2.
  Go away now.
==> file_3 <==
Fri Oct 26 19:42:00 MDT 2012
  I am 3 fine.  How are you, 3. Hello 3.
  Go away now.

5 Comments

I believe the lines in $1 are meant to be filenames that are grepped, not the content itself, so <<< is not appropriate..
@MarkReed, the question says each line is a “document”, which is ambiguous but as you suggest might be a filename instead of a document. I added a note before the code
line would be a long text, its actually a accident report. Each report is made into a line
May I know what is head file_* in this command: ./splitdoc.sh splitdoc.data splitdoc.tags ; head file_*
@Santosh, without options the head command prints the filenames and the first 10 lines of the files listed on the command line. (Doesn't print file name if only 1 file is listed.) See man head
1

I think this will work

i=1
while read line; do
 while read row; do
   echo "$line" | sed 's/\./.\n/g' | grep -i -B 1 "$row"  | tr -d '\n' |  sed 's/--/\n/g' >> file_$i
 done < $2 
 $i = $i+1;
done < $1 

a=0 
while read line; do 
a=$(($a+1)); 
while read row; do
    echo "$line" | sed 's/\./.\n/g' | grep -i -B 1 "$row" | tr -d '\n' | sed 's/--/\n/g' >> file_$a done < $2 done < $1

6 Comments

Hi, Thanks for the quick reply, but it is not working. The output is on console but not on a newfile
do you have an example of what $line would be?
line would be a long text, its actually a accident report. The report is made into 1 line
This code is actually now creating multiple files, but I think there is some mistake with the cat command. It displays text on the console, and there is no output on the new files a=0 while read line; do a=$(($a+1)); while read row; do < "$line" | sed 's/\./.\n/g' | grep -i -B 1 "$row" | tr -d '\n' | sed 's/--/\n/g' >> file_$a done < $2 done < $1
if $line is a line of text and not a file, this line is working for me: echo "$line" | sed 's/\./.\n/g' | grep -i -B 1 "$row" | tr -d '\n' | sed 's/--/\n/g' >> file_$a
|
1

This is not how you increment a variable in the shell:

$i = $i + 1

That instead tries to run a command whose name is the current value of $i. You want this:

let i=i+1

or, more concisely,

let i+=1

This may not be the problem, but it is a problem, and it can lead to odd behavior.

The only other thing I see is a lack of quotation marks around your filenames ("$1", "$2").

Also, if each line is a filename, you don't need cat; just do

<"$line" sed ...

If each line is the contents of a file instead of the name, then cat is entirely wrong, as it tries to find a file whose name is that big long text. You can use this instead:

<<<"$line" sed ...

EDIT Also, if there aren't that many lines in fileB, you might be able to avoid reading it over and over again for every file listed in fileA. Just read all of fileB into memory at once:

IFS=$'\n' rows=($(<"$2"))
let i=0
while read line; do
  for row in "${rows[@]}"; do
    <<<"$line" sed 's/\./.\n/g' | grep -i -B 1 "$row"  | 
             tr -d '\n' |  sed 's/--/\n/g' >> file_$i
  done 
  let i+=1
done < "$1"

In fact, you may even be able to do it in a single grep:

pat=''
while read row; do
  pat="${pat:+$pat|}$row"
done <"$2"

let i=0
while read line; do
  <<<"$line" sed 's/\./.\n/g' | egrep -i -B 1 "$pat"  | 
             tr -d '\n' |  sed 's/--/\n/g' >"file_$i"
let i+=1
done < "$1"

3 Comments

Thank you, I realized that it should be something like you said.
a=0 while read line; do a=$(($a+1)); while read row; do < "$line" | sed 's/\./.\n/g' | grep -i -B 1 "$row" | tr -d '\n' | sed 's/--/\n/g' >> file_$a done < $2 done < $1
Thank you Reed, I have like 10 phrases in my File_B, I would go by your second code. Thank you for making me realize the incrementation in shell scripting.
0

To expand a bit on @BenjiWiebe answer.

One can also discard the stdout of tee with:

echo "something" | tee file1.txt file2.txt file3.txt 1>/dev/null

Although this way it is not posible to mix overwrite with append (unless piped again to other tee).

Use this to mix overwrite and append:

# overwrite file1.txt and append to file2.txt and file3.txt
echo "something" | tee file1.txt | tee -a file2.txt file3.txt 1>/dev/null
# same as
echo "something" | tee -a file2.txt file3.txt > file1.txt

In the end tee and unix pipes are quite flexible, one can then decide what combination makes more sense in a script.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.