0

I am working on a project using the Berka dataset, and I want to build a neural network to predict the loan status for accounts. The dataset contains multiple tables, and I want to avoid flattening them into a single table. Instead, I aim to feed the NN structured data as follows:

Input: For each account, include:

  • All transaction records associated with the account
  • Account-level details
  • Data from "disp" and "card" tables
  • Order records

Output: Predict the loan status (A, B, C, D).

Here are the dimensions after processing:

Transactions: (682, 675, 24)
Account: (682, 8)
Disp and Card: (682, 3, 9)
Order: (682, 5, 6)
Labels (Y): (682, 4) (I have used OneHotEncoding to transform into binary representation)

where the last number means the number of features and the second number (where there are 3) is the number of entries per 1 account (padded)

The distribution of loan_status of Y are as follows:
C: 403,
A: 203,
D: 45,
B: 31

Classes D (debt) and B (default) are minority classes, but the model needs high recall for these.

Neural Network

I plan to use an LSTM for the variable-length tables (e.g., transactions). Below are my input layers:

transaction_input = Input(shape=(None, num_transaction_features), name="transaction_input")
account_input = Input(shape=(num_account_features,), name="account_input")
disp_card_input = Input(shape=(None, num_disp_card_features), name="disp_card_input")
order_input = Input(shape=(None, num_order_features), name="order_input")

My current architecture

x_trans = Masking(mask_value=0.0)(transaction_input)  # Mask padding values
x_trans = LSTM(64, return_sequences=True)(x_trans)  # LSTM 
x_trans = LSTM(64, return_sequences=False)(x_trans)  # LSTM 
x_trans = BatchNormalization()(x_trans)
x_trans = Dense(32, activation="relu")(x_trans)  # Dense 
x_trans = Dense(16, activation="relu")(x_trans)

# Process account data with Dense layers
x_account = Dense(32, activation="relu")(account_input)
x_account = Dense(16, activation="relu")(account_input)
x_account = BatchNormalization()(x_account)

x_disp_card = Masking(mask_value=0.0)(disp_card_input)
x_disp_card = LSTM(32, return_sequences=False)(x_disp_card)  
x_disp_card = Dense(16, activation="relu")(x_disp_card)

x_order = Masking(mask_value=0.0)(order_input)  # Mask padding values
x_order = LSTM(16, return_sequences=False)(x_order)  # LSTM
x_order = Dense(16, activation="relu")(x_order)

# Dense layer with regularization
combined = Concatenate()([x_trans, x_account, x_disp_card, x_order])

# Optionally, add another layer to see if it helps
combined = Dense(16, activation="relu")(combined)
combined = Dense(16, activation="relu")(combined)

# Output layer
output = Dense(4, activation="softmax")(combined)  # classification output

Currently, my model is awful, having this as the classification report:

Class Precision Recall F1-Score Support
A 0.39 0.53 0.45 76
B 0.00 0.00 0.00 0
C 0.87 0.74 0.80 236
D 0.73 0.55 0.63 29
Accuracy 0.68 341
Macro Avg 0.50 0.45 0.47 341
Weighted Avg 0.75 0.68 0.71 341

Questions

  1. Is my approach to feeding structured data into the NN correct?
  2. How should I configure the LSTM layers to handle variable-length inputs like transactions?
  3. What should I use instead of LSTM layers for one-to-many relationship data without any sequential pattern? (1 account may have more than 1 owner/user, or may have many standing orders)?
  4. How can I improve my model's accuracy and recall for minority classes (D and B)?
  5. What are the metrics I should be aiming for? As a bank, I don't want to authorise any loans for people who are likely to not pay back

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.