1

This is my main dataframe that I want to filter.

    first.seqnames  first.start first.end   first.width first.strand    second.seqnames second.start    second.end  second.width    second.strand
126457  chr1    10590184    10590618    GTTAATTATAGATAAATGGGCTAAAATTGCCTCTTGGTTTTGTAAC...   *   chr1    10730773    10731207    GTTAATTATAGATAAATGGGCTAAAATTGCCTCTTGGTTTTGTAAC...   *
126461  chr1    10590958    10591541    CTTTCTTTTGCATACTTGTAGATTTTTCTTCTACTCTGGTTTAGGA...   *   chr1    10731548    10732131    CTTTCTTTTGCATACTTGTAGATTTTTCTTCTACTCTGGTTTAGGA...   *
126544  chr1    10597414    10597918    ATCATTAGGAGATTATTAAAATTTGGAGTGTGTTGGCTGGCCTCGC...   *   chr1    10738018    10738522    ATCATTAGGAGATTATTAAAATTTGGAGTGTGTTGGCTGGCCTCGC...   *
126576  chr1    10600437    10600904    CTCGTTACCATGAAAGCTTTTTTAGCATTGATTTCATAACAGTCTT...   *   chr1    10741045    10741512    CTCGTTACCATGAAAGCTTTTTTAGCATTGATTTCATAACAGTCTT...   *
131172  chr1    11082133    11082593    TGAATCAGTGGTTTAATCTTCTTTGTTTACATCCCTTATTTCTTAT...   *   chr1    11245253    11245713    TGAATCAGTGGTTTAATCTTCTTTGTTTACATCCCTTATTTCTTAT...   *

This is my conditional dataframe based on which I will filter:

    Chrom   Start   End
0   chr1    10590184    10590618
1   chr1    10590958    10591541
2   chr1    10597414    10597918

I've tried the following logic to filter each row. But it's wrong; it is not comparing each row.

header_frame[header_frame['first.end'].isin(knee_df['End']) & header_frame['first.start'].isin(knee_df['Start'])]

I want only those rows in the 1st dataframe which exist in the 2nd dataframe.

1 Answer 1

1

Assuming df1 and df2 the two dataframes, you can inner merge:

df1.merge(df2,
          left_on=['first.seqnames', 'first.start', 'first.end'],
          right_on=['Chrom', 'Start', 'End'],
          how='inner'
         )[df1.columns]

output:

  first.seqnames  first.start  first.end                                        first.width first.strand second.seqnames  second.start  second.end                                       second.width second.strand
0           chr1     10590184   10590618  GTTAATTATAGATAAATGGGCTAAAATTGCCTCTTGGTTTTGTAAC...            *            chr1      10730773    10731207  GTTAATTATAGATAAATGGGCTAAAATTGCCTCTTGGTTTTGTAAC...             *
1           chr1     10590958   10591541  CTTTCTTTTGCATACTTGTAGATTTTTCTTCTACTCTGGTTTAGGA...            *            chr1      10731548    10732131  CTTTCTTTTGCATACTTGTAGATTTTTCTTCTACTCTGGTTTAGGA...             *
2           chr1     10597414   10597918  ATCATTAGGAGATTATTAAAATTTGGAGTGTGTTGGCTGGCCTCGC...            *            chr1      10738018    10738522  ATCATTAGGAGATTATTAAAATTTGGAGTGTGTTGGCTGGCCTCGC...             *
Sign up to request clarification or add additional context in comments.

2 Comments

You are almost right but it creates NaN values when there is no match. So dropna() will add more precision
Then rather try an inner merge ;)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.