Using the pandas library in Python, I have a device in my code that looks like this:
BadData = len(df[df.A1.str.contains('A|T|C|G')==False])
What I'm trying to do here is count the number of entries in the A1 column of the dataframe df that do not contain any combination of the letters A, T, C, and G.
These expressions should be counted as BadData:
- 123
- <%*&
- foo
But these expressions should not:
- A
- ATCG
- GATCATTA
My question: how could I use regex characters to include entries like "Apple" or "Golfing" in BadData?
I could chain together conditions like so:
BadData = len(df[(df.A1.str.contains('A|T|C|G')==False) & (df.A1.str.contains('0|1|2|3')==TRUE)])
But here I face a difficulty: do I have to define every character that violates the condition? This seems clumsy, and I am sure there is a more elegant way.