I’m doing seq2seq machine translation on my own dataset. I have preproceed my dataset using this code.
The problem comes when i tried to split train_data using BucketIterator.split()
def tokenize_word(text):
return nltk.word_tokenize(text)
id = Field(sequential=True, tokenize = tokenize_word, lower=True, init_token="<sos>", eos_token="<eos>")
ti = Field(sequential=True, tokenize = tokenize_word, lower=True, init_token="<sos>", eos_token="<eos>")
fields = {'id': ('i', id), 'ti': ('t', ti)}
train_data = TabularDataset.splits(
path='/content/drive/MyDrive/Colab Notebooks/Tidore/',
train = 'id_ti.tsv',
format='tsv',
fields=fields
)[0]
id.build_vocab(train_data)
ti.build_vocab(train_data)
print(f"Unique tokens in source (id) vocabulary: {len(id.vocab)}")
print(f"Unique tokens in target (ti) vocabulary: {len(ti.vocab)}")
train_iterator = BucketIterator.splits(
(train_data),
batch_size = batch_size,
sort_within_batch = True,
sort_key = lambda x: len(x.id),
device = device
)
print(len(train_iterator))
for data in train_iterator:
print(data.i)
This is the result of the code above
Unique tokens in source (id) vocabulary: 1425
Unique tokens in target (ti) vocabulary: 1297
2004
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-72-e73a211df4bd> in <module>()
31
32 for data in train_iterator:
---> 33 print(data.i)
AttributeError: 'BucketIterator' object has no attribute 'i'
This is the result when i tried to print the train_iterator

I am very confuse, because i don’t know what key i should use for train iterator. Thank you for your help