1

How to split a list in a column into two column in a dataframe using python? For example:

  row  |  column_A                
  ==================================
  1    |[('Ahli', 'NNP'),          |
       | ('paleontologi', 'NNP'),  | 
       | ('Thomas', 'NNP'),        |
       | ('dan', 'CC'),            |
       | ('timnya', 'RB'),         |
       | ('.', 'Z')],              |
  2    |[('fosil', 'NN'),          |
       | ('mamalia', 'NN'),        |
       | ('yang', 'SC'),           |
       | ('menghuni', 'VB'),       |
       | ('Antartika', 'NNP')]     |

I want to get only the secord string from the list:

  row  |  column_A                 | postag
  =======================================
  1    |[('Ahli', 'NNP'),          |[('NNP'),
       | ('paleontologi', 'NNP'),  | (NNP),
       | ('Thomas', 'NNP'),        | (NNP),
       | ('dan', 'CC'),            | (CC),
       | ('timnya', 'RB'),         | (RB),
       | ('.', 'Z')],              | (Z)],
  2    |[('fosil', 'NN'),          |[('NN'),
       | ('mamalia', 'NN'),        | ('NN'), 
       | ('yang', 'SC'),           | ('SC),
       | ('menghuni', 'VB'),       | ('VB'),
       | ('Antartika', 'NNP')]     | ('NNP)]

4 Answers 4

2

Adding to @Biranchi's answer, the correct answer would be

df['postag'] = df['column_A'].apply(lambda x: [(i[1],) for i in x])

Result would be

# print(df)

                                        column_A                      postag
0  [(Ahli, NNP), (paleontologi, NNP), (Thomas, NN...  [(NNP,), (NNP,), (NNP,), ...
Sign up to request clarification or add additional context in comments.

1 Comment

@Biranchi - if you like the answer, please upvote it :)
1

Use, Series.map to apply a custom mapping function which maps each of the list in column_A according to the desired requirements:

df['postag'] = df['column_A'].map(lambda l: [b for a, b in l])

Another possible idea:

df['postag'] = [[y for x, y in lst] for lst in df['column_A']]

Result:

# print(df)

                                            column_A                      postag
0  [(Ahli, NNP), (paleontologi, NNP), (Thomas, NN...  [NNP, NNP, NNP, CC, RB, Z]
1  [(fosil, NN), (mamalia, NN), (yang, SC), (meng...       [NN, NN, SC, VB, NNP]

Comments

1

Try using the apply function on the exiting column to get a new column with the desired result

Example Pseudocode:

df['postag'] = df['column_A'].apply(your_function)

In the your_function, write your logic for separating the pos tags from the list of tuples.

Comments

1

You could achieve this with the following apply function:

data = [{'column_A': [('Ahli', 'NNP'),
        ('paleontologi', 'NNP'),
        ('Thomas', 'NNP'),
        ('dan', 'CC'),
        ('timnya', 'RB'),
        ('.', 'Z')]},
        {'column_A': [('fosil', 'NN'),
        ('mamalia', 'NN'),
        ('yang', 'SC'),
        ('menghuni', 'VB'),
        ('Antartika', 'NNP')]}]

df = pd.DataFrame(data)
df['postag'] = df['column_A'].apply(lambda x : [y[1] for y in x])
df

Output:

    column_A                                            postag
0   [(Ahli, NNP), (paleontologi, NNP), (Thomas, NN...   [NNP, NNP, NNP, CC, RB, Z]
1   [(fosil, NN), (mamalia, NN), (yang, SC), (meng...   [NN, NN, SC, VB, NNP]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.