Using grep
$ head -n1 <file; grep -E "(^|,)($(tr '\n' '|' <inclusion))(,|$)" file | grep -Ev "(^|,)($(tr '\n' '|' <exclusion))(,|$)"
D1,D2,D3,D4,A,B,C,D,E,F
123,00,145,567,A1,B1,C1,D1,E1,F1
567,1250,010,321,A4,B4,C4,D4,E4,F4
Using awk
$ awk -v inc="(^|,)($(tr '\n' '|' <inclusion))(,|$)" -v exc="(^|,)($(tr '\n' '|' <exclusion))(,|$)" 'NR==1 || ($0 ~ inc && ! ($0 ~ exc))' file
D1,D2,D3,D4,A,B,C,D,E,F
123,00,145,567,A1,B1,C1,D1,E1,F1
567,1250,010,321,A4,B4,C4,D4,E4,F4
How it works
For both the grep and awk solutions, the key step is the creation of a regular expression that matches on either the inclusion or exclusion files. Because it is shorter, let's take exclusion as an example. We can create a regex for it as follows:
$ echo "(^|,)($(tr '\n' '|' <exclusion))(,|$)"
(^|,)(456|457|458|459|)(,|$)
The regex for inclusion works analogously. Once the include and exclude regexes have been created, we can use them either with grep or with awk. If using awk, we use the condition:
NR==1 || ($0 ~ inc && ! ($0 ~ exc))
If this condition is true then awk performs its default action which is to print the line. The condition is true if (1) we are on the first line, NR==1 or if (2) the line matches in the regex for inclusion, inc, and does not match the regex for exclusion, exc.
Alternate awk solution
$ gawk -F, -v inc="$(<inclusion)" -v exc="$(<exclusion)" 'BEGIN{n=split(inc,x,"\n"); for (j=1;j<=n;j++)incl[x[j]]=1; n=split(exc,x,"\n"); for (j=1;j<=n;j++)excl[x[j]]=1;} NR==1{print;next} {p=0;for (j=1;j<=NF;j++) if ($j in incl)p=1; for (j=1;j<=NF;j++) if ($j in excl) p=0;} p' file
D1,D2,D3,D4,A,B,C,D,E,F
123,00,145,567,A1,B1,C1,D1,E1,F1
567,1250,010,321,A4,B4,C4,D4,E4,F4
The same code written out over multiple lines looks like:
gawk -F, -v inc="$(<inclusion)" -v exc="$(<exclusion)" '
BEGIN{
n=split(inc,x,"\n")
for (j=1;j<=n;j++)incl[x[j]]=1
n=split(exc,x,"\n")
for (j=1;j<=n;j++)excl[x[j]]=1
}
NR==1{
print
next
}
{
p=0
for (j=1;j<=NF;j++) if ($j in incl) p=1
for (j=1;j<=NF;j++) if ($j in excl) p=0
}
p
' file
The above creates array incl and excl with the inclusion and exclusion data. Any line with a field in incl is marked for printing p=1. If however the line contains a field in excl, then p is set to false, p=0.
inclusionand one that matchesexclusion? What happens if it has neither? What have you tried so far?