How to labeling text based on aspect term and sentiment

Question

I have coded to label text data by term aspect then sentiment with vader lexicon. But the result is only output -1 which means negative and 1 which means positive, where there should be 3 classes of positive, negative and neutral.

Here is the code :

import pandas as pd
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

# Define the aspect keywords
system_keywords = ['server', 'bug', 'error', 'sinyal', 'jaringan', 'login', 'update', 
                   'perbaruan', 'loading', 'aplikasi', 'fitur', 'UI/UX' , 'tampilan', 
                   'data', 'otp', 'keamanan']
layanan_keywords = ['customer service', 'cs', 'call center', 'telepon', 'email', 'beli', 
                    'pertalite', 'bbm', 'topup']
transaksi_keywords = ['cash', 'cashless', 'debit', 'tunai', 'scan', 'e-wallet', 
                      'linkaja', 'link', 'bayar', 'ovo', 'transaksi', 'pembayaran', 
                      'cashback', 'struk', 'tunai', 'nontunai']
subsidi_keywords = ['verifikasi', 'data', 'form', 'formulir', 'daftar', 'subsidi', 
                    'pendaftaran', 'subsidi', 'kendaraan', 'formulir', 'stnk', 'ktp', 
                    'nopol', 'no', 'kendaraan', 'nomor', 'polisi', 'foto', 'kendaraan', 
                    'alamat', 'provinsi', 'kota', 'kabupaten', 'kecamatan']
kebermanfaatan_keywords = ['bagus', 'mantap', 'recommend', 'oke', 'mudah', 'berguna', 
                           'membantu', 'simple', 'guna', 'bantu']

# Define a function to label the aspect based on the aspect keywords
def label_aspect(text):
    aspect_labels = [0] * 5 # Initialize all aspect labels to 0
    for i, keywords in enumerate([system_keywords, layanan_keywords, transaksi_keywords, 
           subsidi_keywords, kebermanfaatan_keywords]):
        for keyword in keywords:
            if keyword in text:
                aspect_labels[i] = 1
                break
    return aspect_labels

# Load the data into a DataFrame
data = {'content': ['Sejak menggunakan aplikasi mypertamina beli pertalite jadi lebih simple dan mudah karena aplikasi ini bener bener membantu untuk meringankan penjual dan pembeli recomend bisa bayar pakai tunai atau nontunai mantepp', 
                    'sering ada bug, aplikasi tidak user friendly. bingung dalam menginput data untuk subsidi. tidak ada notifikasi apakah data inputan sudah masuk atau belum. Tolong diperbaiki',
                    'Bagus juga aplikasi, kalo ada promo seperti ini kan para pemakai premium bisa jadi beralih ke pertalite bahkan pertamax. Coba ada promo2 lainnya seperti kerja sama dg situs belanja online ya min. Pertahankan min',
                    'kadang sulit di akses terakhir ada perintah update MyPertamina, saya ikuti, setelah update, jadi sulit masuk seolah data tidak ada, malah QR code tidak bisa muncul, dan belum sempat saya print',
                    'buruk, sudah coba daftar berkali kali tetap gak bisa. Mau beli bbm harus ada barcode, daftar susah ah bukan nya memudahkan rakyat malah tambah mempersulit']}
df = pd.DataFrame(data)

# Utilize nltk VADER to use custom lexicon
vader_lexicon = SentimentIntensityAnalyzer()

# Add the aspect columns to the DataFrame and label them
aspect_labels = df['content'].apply(label_aspect)
df['sistem'], df['layanan'], df['transaksi'], df['pendaftaran subsidi'], df['kebermanfaatan'] = zip(*aspect_labels)

# Apply Vader sentiment analysis to label the aspect columns
for col in ['sistem', 'layanan', 'transaksi', 'pendaftaran subsidi', 'kebermanfaatan']:
    df[col] = df['content'].apply(lambda x: 1 if vader_lexicon.polarity_scores(x) 
              ['compound'] >= 0.05 and df[col][0] == 1 else (-1 if 
              vader_lexicon.polarity_scores(x)['compound'] <= -0.05 and df[col][0] == 1 
              else 0))

# Display the resulting DataFrame
df

Here is the output

The output results are still not correct. As in the example data :

"Sejak menggunakan aplikasi mypertamina beli pertalite jadi lebih simple dan mudah karena aplikasi ini bener bener membantu untuk meringankan penjual dan pembeli recomend bisa bayar pakai tunai atau nontunai mantepp". In this sentence there are no words contained in the subsidi_keywords aspect, but the results in the "pendaftaran subsidi" column contain a value of is 1, should contain the value is 0
"sering ada bug, aplikasi tidak user friendly. bingung dalam menginput data untuk subsidi. tidak ada notifikasi apakah data inputan sudah masuk atau belum. Tolong diperbaiki". In this sentence there are no words contained in the transaksi_keywords, layanan_keywords, and kebermanfaatan_keywords aspect, but the results in the "transaksi" column, "layanan" column, and "kebermanfaatan" column contain a value of is 1, should contain the value is 0

I don't have access to Vader, but your code seems to work for me when I replace that with random.random()*0.1. I get values of 1,0, and -1 as expected in the output. — Nick
– Nick, Commented Mar 12, 2023 at 0:31
Note I would recommend replacing your lambda function with a defined one so you can avoid making two calls to vader_lexicon.polarity_scores(x) or perhaps rewrite to 0 if df[col] == 0 else (1 if vader_lexicon.polarity_scores(x) >= 0.05 else -1) — Nick
– Nick, Commented Mar 12, 2023 at 0:33
sorry nick I still don't understand your answer, can you provide a code — Annisa Lianda
– Annisa Lianda, Commented Mar 12, 2023 at 6:44
df[col] = df['content'].apply(lambda x: 0 if df[col] == 0 else (1 if vader_lexicon.polarity_scores(x) >= 0.05 else -1)) — Nick
– Nick, Commented Mar 12, 2023 at 6:48
get an error like this ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). — Annisa Lianda
– Annisa Lianda, Commented Mar 12, 2023 at 6:54

Nick · Accepted Answer · 2023-03-15 04:00:24Z

1

Your issue is that you are always using df[col][0] to test against 0 or 1, where you should be using the appropriate row for the content. You can work around that using np.where to do the computation. Note that the result from Vader that you are testing is a constant (doesn't vary per column) so you can compute it outside the loop:

compound = df['content'].apply(lambda x: 1 if vader_lexicon.polarity_scores(x)['compound'] >= 0.05 else -1)
for col in ['sistem', 'layanan', 'transaksi', 'pendaftaran subsidi', 'kebermanfaatan']:
    df[col] = np.where(df[col] == 0, 0, compound)

answered Mar 15, 2023 at 4:00

Nick

147k23 gold badges67 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Annisa Lianda Over a year ago

hey again @Nick, if you are not bussy, would you please help me again in this case stackoverflow.com/questions/75803977/…? Thankyou

Nick Over a year ago

@AnnisaLianda sorry, I'm not familiar with Flask so I don't think I can help you.

Collectives™ on Stack Overflow

How to labeling text based on aspect term and sentiment

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related