How to Build a Neural Network for Predicting Loan Status Using Multi-Table Data from the Berka Dataset

Ask Question

Asked 10 months ago

Modified 10 months ago

Viewed 31 times

I am working on a project using the Berka dataset, and I want to build a neural network to predict the loan status for accounts. The dataset contains multiple tables, and I want to avoid flattening them into a single table. Instead, I aim to feed the NN structured data as follows:

Input: For each account, include:

All transaction records associated with the account
Account-level details
Data from "disp" and "card" tables
Order records

Output: Predict the loan status (A, B, C, D).

Here are the dimensions after processing:

Transactions: (682, 675, 24)
Account: (682, 8)
Disp and Card: (682, 3, 9)
Order: (682, 5, 6)
Labels (Y): (682, 4) (I have used OneHotEncoding to transform into binary representation)

where the last number means the number of features and the second number (where there are 3) is the number of entries per 1 account (padded)

The distribution of loan_status of Y are as follows:
C: 403,
A: 203,
D: 45,
B: 31

Classes D (debt) and B (default) are minority classes, but the model needs high recall for these.

Neural Network

I plan to use an LSTM for the variable-length tables (e.g., transactions). Below are my input layers:

transaction_input = Input(shape=(None, num_transaction_features), name="transaction_input")
account_input = Input(shape=(num_account_features,), name="account_input")
disp_card_input = Input(shape=(None, num_disp_card_features), name="disp_card_input")
order_input = Input(shape=(None, num_order_features), name="order_input")

My current architecture

x_trans = Masking(mask_value=0.0)(transaction_input)  # Mask padding values
x_trans = LSTM(64, return_sequences=True)(x_trans)  # LSTM 
x_trans = LSTM(64, return_sequences=False)(x_trans)  # LSTM 
x_trans = BatchNormalization()(x_trans)
x_trans = Dense(32, activation="relu")(x_trans)  # Dense 
x_trans = Dense(16, activation="relu")(x_trans)

# Process account data with Dense layers
x_account = Dense(32, activation="relu")(account_input)
x_account = Dense(16, activation="relu")(account_input)
x_account = BatchNormalization()(x_account)

x_disp_card = Masking(mask_value=0.0)(disp_card_input)
x_disp_card = LSTM(32, return_sequences=False)(x_disp_card)  
x_disp_card = Dense(16, activation="relu")(x_disp_card)

x_order = Masking(mask_value=0.0)(order_input)  # Mask padding values
x_order = LSTM(16, return_sequences=False)(x_order)  # LSTM
x_order = Dense(16, activation="relu")(x_order)

# Dense layer with regularization
combined = Concatenate()([x_trans, x_account, x_disp_card, x_order])

# Optionally, add another layer to see if it helps
combined = Dense(16, activation="relu")(combined)
combined = Dense(16, activation="relu")(combined)

# Output layer
output = Dense(4, activation="softmax")(combined)  # classification output

Currently, my model is awful, having this as the classification report:

Class	Precision	Recall	F1-Score	Support
A	0.39	0.53	0.45	76
B	0.00	0.00	0.00	0
C	0.87	0.74	0.80	236
D	0.73	0.55	0.63	29
Accuracy			0.68	341
Macro Avg	0.50	0.45	0.47	341
Weighted Avg	0.75	0.68	0.71	341

Questions

Is my approach to feeding structured data into the NN correct?
How should I configure the LSTM layers to handle variable-length inputs like transactions?
What should I use instead of LSTM layers for one-to-many relationship data without any sequential pattern? (1 account may have more than 1 owner/user, or may have many standing orders)?
How can I improve my model's accuracy and recall for minority classes (D and B)?
What are the metrics I should be aiming for? As a bank, I don't want to authorise any loans for people who are likely to not pay back

asked Dec 29, 2024 at 3:49

Dmitrii Ponomarev

11 silver badge4 bronze badges

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

How to Build a Neural Network for Predicting Loan Status Using Multi-Table Data from the Berka Dataset

Neural Network

My current architecture

Questions

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

Neural Network

My current architecture

Questions

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest