0

I am trying to train Glove model using python on my text corpus following the implementation specified on this page. Glove model I am encountering problems while reading corpus file from the specified path

parser.add_argument('corpus', metavar='corpus_path',
                        type=partial(codecs.open, encoding='utf-8'))

How to specify file path for this argument. I have used command line argument as shown below

C:\Users\JAYASHREE\Documents\NLP>python Glove_python_bbc.py 'C:/Users/JAYASHREE/Documents/NLP/text-corpus' --vocab-path C:/Users/JAYASHREE/Documents/NLP/vocabulary --cooccur-path C:/Users/JAYASHREE/Documents/NLP/cooccur_matrix -w 10 --min-count 10 --vector-path C:/Users/JAYASHREE/Documents/NLP/word-vector -s 40 --iterations 10 --learning-rate 0.1 --save-often True

I am getting error as follows

Traceback (most recent call last):
  File "Glove_python_bbc.py", line 380, in <module>
    main(parse_args())
  File "Glove_python_bbc.py", line 70, in parse_args
    return parser.parse_args()
  File "C:\Users\JAYASHREE\Anaconda2\lib\argparse.py", line 1701, in parse_args
    args, argv = self.parse_known_args(args, namespace)
  File "C:\Users\JAYASHREE\Anaconda2\lib\argparse.py", line 1733, in parse_known_args
    namespace, args = self._parse_known_args(args, namespace)
  File "C:\Users\JAYASHREE\Anaconda2\lib\argparse.py", line 1921, in _parse_known_args
    positionals_end_index = consume_positionals(start_index)
  File "C:\Users\JAYASHREE\Anaconda2\lib\argparse.py", line 1898, in consume_positionals
    take_action(action, args)
  File "C:\Users\JAYASHREE\Anaconda2\lib\argparse.py", line 1791, in take_action
    argument_values = self._get_values(action, argument_strings)
  File "C:\Users\JAYASHREE\Anaconda2\lib\argparse.py", line 2231, in _get_values
    value = self._get_value(action, arg_string)
  File "C:\Users\JAYASHREE\Anaconda2\lib\argparse.py", line 2260, in _get_value
    result = type_func(arg_string)
  File "C:\Users\JAYASHREE\Anaconda2\lib\codecs.py", line 896, in open
    file = __builtin__.open(filename, mode, buffering)
IOError: [Errno 22] invalid mode ('rb') or filename: "'C:/Users/JAYASHREE/Documents/NLP/text-corpus'"

How to pass argument to corpus path

Thanks

1 Answer 1

1

In my impression, single quotes in Windows causes problems in command line arguments, they are not escaped but interpreted as a part of the string. See the last line in the error log:

IOError: [Errno 22] invalid mode ('rb') or filename: "'C:/Users/JAYASHREE/Documents/NLP/text-corpus'"

the filename has single quotes in it.

Simply replace single quotes with double quotes, or in your case, omit the quots, and you will be fine.

Unix-like operating systems doesn't seem like to have these problems.

See this and this question you might will get a hint.

Sign up to request clarification or add additional context in comments.

2 Comments

When I am using double quotes or not using any quotes I am getting following error ' [-h] [--vocab-path VOCAB_PATH] [--cooccur-path COOCCUR_PATH] [-w WINDOW_SIZE] [--min-count MIN_COUNT] [--vector-path VECTOR_PATH] [-s VECTOR_SIZE] [--iterations ITERATIONS] [--learning-rate LEARNING_RATE] [--save-often] corpus_path Glove_python_bbc.py: error: unrecognized arguments: True'
@Jayashree that's another thing. The --save-often do not need a specific True there. When the --save-often option is presented means you have set that option to true (see the action='store_true') simply remove the True then it's done.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.