I have over forty files with the following structure:
file1 first 21 lines
8191 M0
139559 M1
79 M10
1 M10007
1 M1006
1 M10123
file2 first 21 lines
8584 M0
119837 M1
72 M10
1 M10003
1 M10045
1 M1014
file3 first 21 lines
9090 M0
137373 M1
73 M10
1 M10046
2 M101
1 M1039
where number is the number of occurrences of an M pattern, tab-separated. Now, the thing is these M patterns are in part shared across all files and are in the range of 700-800 total for each one of them.
What I wish to do is to use AWK to extract only those common to all the forty-plus files (say ~600) along with their count (the column). Ideally, the final file will have forty-plus columns + 1 indicating for the shared M patterns, with no particular order as I can then sort on the last M-pattern column.
Something like this I imagine:
| file1 | file2 | file3 | M-pattern |
|---|---|---|---|
| 8191 | 8584 | 9090 | M0 |
| 139599 | 119837 | 137373 | M1 |
| 79 | 72 | 73 | M10 |
In theory since AWK should work sequentially I should be able to parse a header afterward that reflects the order in which files have been added using something like sed. Any help is much appreciated, thanks in advance!
So far, I have attempted the following found in some other related answers:
awk 'FNR==NR{a[$0];next} $0 in a' one two
but it seems to not acting on column $2 where my M-patterns are, neither I think I understood how to modify it to do so eventually...
M0is found?sortandunique -cto get those final output I wish to process.