3

I have coded to label text data by term aspect then sentiment with vader lexicon. But the result is only output -1 which means negative and 1 which means positive, where there should be 3 classes of positive, negative and neutral.

Here is the code :

import pandas as pd
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

# Define the aspect keywords
system_keywords = ['server', 'bug', 'error', 'sinyal', 'jaringan', 'login', 'update', 
                   'perbaruan', 'loading', 'aplikasi', 'fitur', 'UI/UX' , 'tampilan', 
                   'data', 'otp', 'keamanan']
layanan_keywords = ['customer service', 'cs', 'call center', 'telepon', 'email', 'beli', 
                    'pertalite', 'bbm', 'topup']
transaksi_keywords = ['cash', 'cashless', 'debit', 'tunai', 'scan', 'e-wallet', 
                      'linkaja', 'link', 'bayar', 'ovo', 'transaksi', 'pembayaran', 
                      'cashback', 'struk', 'tunai', 'nontunai']
subsidi_keywords = ['verifikasi', 'data', 'form', 'formulir', 'daftar', 'subsidi', 
                    'pendaftaran', 'subsidi', 'kendaraan', 'formulir', 'stnk', 'ktp', 
                    'nopol', 'no', 'kendaraan', 'nomor', 'polisi', 'foto', 'kendaraan', 
                    'alamat', 'provinsi', 'kota', 'kabupaten', 'kecamatan']
kebermanfaatan_keywords = ['bagus', 'mantap', 'recommend', 'oke', 'mudah', 'berguna', 
                           'membantu', 'simple', 'guna', 'bantu']

# Define a function to label the aspect based on the aspect keywords
def label_aspect(text):
    aspect_labels = [0] * 5 # Initialize all aspect labels to 0
    for i, keywords in enumerate([system_keywords, layanan_keywords, transaksi_keywords, 
           subsidi_keywords, kebermanfaatan_keywords]):
        for keyword in keywords:
            if keyword in text:
                aspect_labels[i] = 1
                break
    return aspect_labels

# Load the data into a DataFrame
data = {'content': ['Sejak menggunakan aplikasi mypertamina beli pertalite jadi lebih simple dan mudah karena aplikasi ini bener bener membantu untuk meringankan penjual dan pembeli recomend bisa bayar pakai tunai atau nontunai mantepp', 
                    'sering ada bug, aplikasi tidak user friendly. bingung dalam menginput data untuk subsidi. tidak ada notifikasi apakah data inputan sudah masuk atau belum. Tolong diperbaiki',
                    'Bagus juga aplikasi, kalo ada promo seperti ini kan para pemakai premium bisa jadi beralih ke pertalite bahkan pertamax. Coba ada promo2 lainnya seperti kerja sama dg situs belanja online ya min. Pertahankan min',
                    'kadang sulit di akses terakhir ada perintah update MyPertamina, saya ikuti, setelah update, jadi sulit masuk seolah data tidak ada, malah QR code tidak bisa muncul, dan belum sempat saya print',
                    'buruk, sudah coba daftar berkali kali tetap gak bisa. Mau beli bbm harus ada barcode, daftar susah ah bukan nya memudahkan rakyat malah tambah mempersulit']}
df = pd.DataFrame(data)

# Utilize nltk VADER to use custom lexicon
vader_lexicon = SentimentIntensityAnalyzer()

# Add the aspect columns to the DataFrame and label them
aspect_labels = df['content'].apply(label_aspect)
df['sistem'], df['layanan'], df['transaksi'], df['pendaftaran subsidi'], df['kebermanfaatan'] = zip(*aspect_labels)

# Apply Vader sentiment analysis to label the aspect columns
for col in ['sistem', 'layanan', 'transaksi', 'pendaftaran subsidi', 'kebermanfaatan']:
    df[col] = df['content'].apply(lambda x: 1 if vader_lexicon.polarity_scores(x) 
              ['compound'] >= 0.05 and df[col][0] == 1 else (-1 if 
              vader_lexicon.polarity_scores(x)['compound'] <= -0.05 and df[col][0] == 1 
              else 0))

# Display the resulting DataFrame
df

Here is the output enter image description here

The output results are still not correct. As in the example data :

  • "Sejak menggunakan aplikasi mypertamina beli pertalite jadi lebih simple dan mudah karena aplikasi ini bener bener membantu untuk meringankan penjual dan pembeli recomend bisa bayar pakai tunai atau nontunai mantepp". In this sentence there are no words contained in the subsidi_keywords aspect, but the results in the "pendaftaran subsidi" column contain a value of is 1, should contain the value is 0
  • "sering ada bug, aplikasi tidak user friendly. bingung dalam menginput data untuk subsidi. tidak ada notifikasi apakah data inputan sudah masuk atau belum. Tolong diperbaiki". In this sentence there are no words contained in the transaksi_keywords, layanan_keywords, and kebermanfaatan_keywords aspect, but the results in the "transaksi" column, "layanan" column, and "kebermanfaatan" column contain a value of is 1, should contain the value is 0
10
  • I don't have access to Vader, but your code seems to work for me when I replace that with random.random()*0.1. I get values of 1,0, and -1 as expected in the output. Commented Mar 12, 2023 at 0:31
  • Note I would recommend replacing your lambda function with a defined one so you can avoid making two calls to vader_lexicon.polarity_scores(x) or perhaps rewrite to 0 if df[col] == 0 else (1 if vader_lexicon.polarity_scores(x) >= 0.05 else -1) Commented Mar 12, 2023 at 0:33
  • sorry nick I still don't understand your answer, can you provide a code Commented Mar 12, 2023 at 6:44
  • df[col] = df['content'].apply(lambda x: 0 if df[col] == 0 else (1 if vader_lexicon.polarity_scores(x) >= 0.05 else -1)) Commented Mar 12, 2023 at 6:48
  • get an error like this ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). Commented Mar 12, 2023 at 6:54

1 Answer 1

1

Your issue is that you are always using df[col][0] to test against 0 or 1, where you should be using the appropriate row for the content. You can work around that using np.where to do the computation. Note that the result from Vader that you are testing is a constant (doesn't vary per column) so you can compute it outside the loop:

compound = df['content'].apply(lambda x: 1 if vader_lexicon.polarity_scores(x)['compound'] >= 0.05 else -1)
for col in ['sistem', 'layanan', 'transaksi', 'pendaftaran subsidi', 'kebermanfaatan']:
    df[col] = np.where(df[col] == 0, 0, compound)
Sign up to request clarification or add additional context in comments.

2 Comments

hey again @Nick, if you are not bussy, would you please help me again in this case stackoverflow.com/questions/75803977/…? Thankyou
@AnnisaLianda sorry, I'm not familiar with Flask so I don't think I can help you.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.