Pytorch Text AttributeError: ‘BucketIterator’ object has no attribute

Question

I’m doing seq2seq machine translation on my own dataset. I have preproceed my dataset using this code.

The problem comes when i tried to split train_data using BucketIterator.split()

def tokenize_word(text):
  return nltk.word_tokenize(text)

id = Field(sequential=True, tokenize = tokenize_word, lower=True, init_token="<sos>", eos_token="<eos>")
ti = Field(sequential=True, tokenize = tokenize_word, lower=True, init_token="<sos>", eos_token="<eos>")

fields = {'id': ('i', id), 'ti': ('t', ti)}

train_data = TabularDataset.splits(
    path='/content/drive/MyDrive/Colab Notebooks/Tidore/',
    train = 'id_ti.tsv',
    format='tsv',
    fields=fields
)[0]

id.build_vocab(train_data)
ti.build_vocab(train_data)

print(f"Unique tokens in source (id) vocabulary: {len(id.vocab)}")
print(f"Unique tokens in target (ti) vocabulary: {len(ti.vocab)}")

train_iterator = BucketIterator.splits(
    (train_data),
    batch_size = batch_size,
    sort_within_batch = True,
    sort_key = lambda x: len(x.id),
    device = device
)

print(len(train_iterator))

for data in train_iterator:
  print(data.i)

This is the result of the code above

Unique tokens in source (id) vocabulary: 1425
Unique tokens in target (ti) vocabulary: 1297
2004

---------------------------------------------------------------------------

AttributeError                            Traceback (most recent call last)

<ipython-input-72-e73a211df4bd> in <module>()
     31 
     32 for data in train_iterator:
---> 33   print(data.i)

AttributeError: 'BucketIterator' object has no attribute 'i'

This is the result when i tried to print the train_iterator

I am very confuse, because i don’t know what key i should use for train iterator. Thank you for your help

Please provide the question with code snippets instead of images :) — Meti
– Meti, Commented Aug 26, 2021 at 7:41

Xeelley · Accepted Answer · 2021-12-14 23:30:53Z

2

train_iterator = BucketIterator.splits(
  (train_data),
  batch_size = batch_size,
  sort_within_batch = True,
  sort_key = lambda x: len(x.id),
  device = device
)

here
Use BucketIterator instead of BucketIterator.splits when there is only one iterator needs to be generated.

I have met this problem and the method mentioned above works.

edited Dec 14, 2021 at 23:30

Xeelley

1,1592 gold badges9 silver badges18 bronze badges

answered Dec 14, 2021 at 14:34

Aurora

213 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Meti · Accepted Answer · 2021-08-29 06:05:09Z

According to torchtext documents, it's better to use TranslationDataset to do what is desired! but if for some reason you prefer to use TabularDataset its better to do it like:

import nltk
print(nltk.__version__)
from torchtext import data
import torchtext
print(torchtext.__version__)
def tokenize_word(text):
    return nltk.word_tokenize(text)

batch_size = 5

SRC = Field(sequential=True, tokenize = tokenize_word, lower=True, init_token="<sos>", eos_token="<eos>")
TRG = Field(sequential=True, tokenize = tokenize_word, lower=True, init_token="<sos>", eos_token="<eos>")

train = data.TabularDataset.splits(
    path='./data/', train='tr.tsv', format='tsv',
    fields=[('src', SRC), ('trg', TRG)])[0]

SRC.build_vocab(train)
TRG.build_vocab(train)

train_iter = data.BucketIterator(
    train, batch_size=batch_size,
    sort_key=lambda x: len(x.text), device=0)

for item in train_iter:
    print(item.trg)

Output:

3.6.2
0.6.0
tensor([[2, 2, 2, 2, 2],
        [5, 5, 5, 5, 5],
        [4, 4, 4, 4, 4],
        [6, 6, 6, 6, 6],
        [7, 7, 7, 7, 7],
        [3, 3, 3, 3, 3]])
tensor([[2, 2, 2, 2, 2],
        [5, 5, 5, 5, 5],
        [4, 4, 4, 4, 4],
        [6, 6, 6, 6, 6],
        [7, 7, 7, 7, 7],
        [3, 3, 3, 3, 3]])

NOTE: make sure there is tr.tsv file contains text columns separated by tab, in data directory. Welcome to stackoverflow & hope it helps :)

Collectives™ on Stack Overflow

Pytorch Text AttributeError: ‘BucketIterator’ object has no attribute

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related