1

I need a script to edit files. Im going a bit crazy about this already :).

I've got two files:

143956;lorem 
143957;ipsum
143958;lala
143959;vuvu

and second

512;143956;15
2156;143957;15
153;143958;4968
2156;143959;486

And what I need is to put those two files together in this way:

512;143956;lorem;15
2156;143957;ipsum;15
153;143958;lala;4968
2156;143959;vuvu;486

That doesn't look that difficult and probably paste would be enough, but theres a catch. There are rows which are only in one of those files but aren't in the other. In such case I need to wait on the line on the second file and still comparing to the first.

Example:

143956;lorem 
143957;ipsum
143959;vuvu //here "lulu" is missing, will compare with 3rd line (143958) but the script wont declare this as "not found" but keep on searching till finds 143959 (which is already on 4 in this case).

 512;143956;15 
 2156;143957;15  
 153;143958;4968
 2156;143959;486

The output would look like this then:

512;143956;lorem;15
2156;143957;ipsum;15
2156;143959;vuvu;486

Or better this way

512;143956;lorem;15
2156;143957;ipsum;15
153;143958;*WAS NOT FOUND*;4968
2156;143959;vuvu;486

But I can finish that on my own...

Hope this is understandable. Thanks very much for any help.

4
  • Thanks for posting your example input including an edge case. Could you also post the expected output for this case? Commented Nov 13, 2010 at 22:13
  • Does it have to be in bash or could we switch to a proper programming language like python? Commented Nov 13, 2010 at 22:22
  • Thanks for reply Mark, I edited the question including the edge case output. Commented Nov 13, 2010 at 22:25
  • It can be done in anything I can ran I think, I am just trying to learn something new in bash, but wont mind any working solutions :). Commented Nov 13, 2010 at 22:26

2 Answers 2

1

Using Bash process substitution (<()) and the join utility:

join -t \; -1 1 -2 2 -o 2.1,2.2,1.2,2.3 <(sort file1) <(sort -t \; -k2,2 file2)

Or you can presort the files.

To output the records that appear in file2 but don't appear in file1:

join -t \; -1 1 -2 2 -v 2 -o 2.1,2.2,1.2,2.3 <(sort file1) <(sort -t \; -k2,2 file2) | sed 's/;;/;*WAS NOT FOUND*;/'
Sign up to request clarification or add additional context in comments.

3 Comments

Great, it looks like its working! I am gonna try it fully tomorrow - need some sleep now. But it looks really great. Thanks
Theres just one think I am getting double \n over there, because we are using the end parts in both of the files, can I get rid of that?
@aGr: Can you explain more about what's in the source data and the result that is related to the problem? Where are the extra newlines in the output? What does "using the end parts in both of the files" mean? Is this happening using both of the commands I gave or if only one, which one?
1

If the first file isn't too large, you can do (test1 and test2 are two files in the order you specified):

#!/bin/sh

for line in `cat test2`; do
  number=`echo "$line" | grep -o ";[0-9]*;" | sed 's/;//g'`
  repl=`grep "$number;" test1`
  if [ -z "$repl" ]; then
    echo "$line" | sed "s#;$number;#;$number;*WAS NOT FOUND*;#g"
  else
    echo "$line" | sed "s#;$number;#;$repl;#g"
  fi  
done

2 Comments

Unfortunatly it is - about 30 Mb. I am getting this error "./sc: 11: sed: Argument list too long" when I sorted the files. Before when I didnt I got some results, but it wrote a different message - I can write it here but sorting it is the right thing anyway, or isnt it?
Although the data in the question doesn't contain any spaces, it's a bad habit to use for $(cat file) because this will break each word onto a separate line. The correct way to do this is while read -r line; do ... done < file. And you're right about large files. Calling that many utilities on each line in a file could be incredibly slow. Also, you've got $number twice. There should be two different ones. That would mean yet another echo|grep|sed. And yet another one to parse the needed string from test1.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.