Let's say you start with the following mock DataFrame:
In [21]: df = pd.DataFrame(
...: {
...: 'appln_id': range(10),
...: 'cpc_class_symbol': [
...: ['ABC', 'DEF', 'Y02_foo'],
...: ['Y02_bar', 'ABC', 'DEF', 'XYZ'],
...: ['ABC'],
...: ['XYZ'],
...: [],
...: ['Y02'],
...: ['ABC', 'Y02_foo'],
...: ['ABC', 'XYZ'],
...: ['ABC', 'DEF', 'XYZ'],
...: ['Y02_foo', 'XYZ'],
...: ],
...: },
...: )
In [22]: df
Out[22]:
appln_id cpc_class_symbol
0 0 [ABC, DEF, Y02_foo]
1 1 [Y02_bar, ABC, DEF, XYZ]
2 2 [ABC]
3 3 [XYZ]
4 4 []
5 5 [Y02]
6 6 [ABC, Y02_foo]
7 7 [ABC, XYZ]
8 8 [ABC, DEF, XYZ]
9 9 [Y02_foo, XYZ]
If you use df['cpc_class_symbol'].explode() you will end up with a Series where each list item is in a separate row:
In [23]: df['cpc_class_symbol'].explode()
Out[23]:
0 ABC
0 DEF
0 Y02_foo
1 Y02_bar
1 ABC
1 DEF
1 XYZ
2 ABC
3 XYZ
4 NaN
5 Y02
6 ABC
6 Y02_foo
7 ABC
7 XYZ
8 ABC
8 DEF
8 XYZ
9 Y02_foo
9 XYZ
Name: cpc_class_symbol, dtype: object
The index of the Series shows the original row labels. Now, you can use the str accessor of the Series to check whether it startswith a certain string or not.
In [24]: df['cpc_class_symbol'].explode().str.startswith('Y02')
Out[24]:
0 False
0 False
0 True
1 True
1 False
1 False
1 False
2 False
3 False
4 NaN
5 True
6 False
6 True
7 False
7 False
8 False
8 False
8 False
9 True
9 False
Name: cpc_class_symbol, dtype: object
What you want to do is to group this Series by the index and check whether any of the items for the corresponding index is True. You can do that with Series.groupby:
In [25]: df['cpc_class_symbol'].explode().str.startswith('Y02').groupby(level=0).any().astype('int')
Out[25]:
0 1
1 1
2 0
3 0
4 0
5 1
6 1
7 0
8 0
9 1
Name: cpc_class_symbol, dtype: int64
Here, groupby(level=0) refers to the first level of the index (in your case you only have one level so it basically means group by the index). You can assign this back to the DataFrame with
df['Y02_bin'] = df['cpc_class_symbol'].explode().str.startswith('Y02').groupby(level=0).any().astype('int')
This might be a little memory intensive and I think using a regular for-loop on the Series and collecting the results in a list should be pretty efficient for your use case compared to using DataFrame.itertuples and assigning to each row individually.
That would look like this:
In [31]: [any(item.startswith('Y02') for item in row) for row in df['cpc_class_symbol']]
Out[31]: [True, True, False, False, False, True, True, False, False, True]
You can also assign this back to your original DataFrame:
In [34]: df['Y02_bin_loop'] = [int(any(item.startswith('Y02') for item in row)) for row in df['cpc_class_symbol']]
The results will be the same:
In [37]: df
Out[37]:
appln_id cpc_class_symbol Y02_bin_loop Y02_bin
0 0 [ABC, DEF, Y02_foo] 1 1
1 1 [Y02_bar, ABC, DEF, XYZ] 1 1
2 2 [ABC] 0 0
3 3 [XYZ] 0 0
4 4 [] 0 0
5 5 [Y02] 1 1
6 6 [ABC, Y02_foo] 1 1
7 7 [ABC, XYZ] 0 0
8 8 [ABC, DEF, XYZ] 0 0
9 9 [Y02_foo, XYZ] 1 1