Below are the answers to your four questions. I'm afraid some of the things you want to do are not possible in the standard library, unless you want to parse the docstrings yourself.
(1) BTW, what's dir(function) call?
If I understand this question correctly, I believe the docs answer that question here:
If the object has a method named __dir__(), this method will be called
and must return the list of attributes. This allows objects that
implement a custom __getattr__() or __getattribute__() function to
customize the way dir() reports their attributes.
If the object does not provide __dir__(), the function tries its best
to gather information from the object’s __dict__ attribute, if
defined, and from its type object.
(2) How do I know what parameters is necessary to call the function?
The best way is to use inspect:
>>> from nltk import pos_tag
>>> from inspect import getargspec
>>> getargspec(pos_tag)
ArgSpec(args=['tokens'], varargs=None, keywords=None, defaults=None) # a named tuple
>>> getargspec(pos_tag).args
['tokens']
(3) If a docstring is available for the function is there a way to
know what is the parameter type that the function is expecting for a
specific parameter?
Not in the standard library, unless you want to parse the docstring on your own. You probably already know that you can access the docstrings like this:
>>> from inspect import getdoc
>>> print getdoc(pos_tag)
Use NLTK's currently recommended part of speech tagger to
tag the given list of tokens.
>>> from nltk.tag import pos_tag
>>> from nltk.tokenize import word_tokenize
>>> pos_tag(word_tokenize("John's big idea isn't all that bad."))
[('John', 'NNP'), ("'s", 'POS'), ('big', 'JJ'), ('idea', 'NN'), ('is',
'VBZ'), ("n't", 'RB'), ('all', 'DT'), ('that', 'DT'), ('bad', 'JJ'),
('.', '.')]
:param tokens: Sequence of tokens to be tagged
:type tokens: list(str)
:return: The tagged tokens
:rtype: list(tuple(str, str))
or this:
>>> print pos_tag.func_code.co_consts[0]
Use NLTK's currently recommended part of speech tagger to
tag the given list of tokens.
>>> from nltk.tag import pos_tag
>>> from nltk.tokenize import word_tokenize
>>> pos_tag(word_tokenize("John's big idea isn't all that bad."))
[('John', 'NNP'), ("'s", 'POS'), ('big', 'JJ'), ('idea', 'NN'), ('is',
'VBZ'), ("n't", 'RB'), ('all', 'DT'), ('that', 'DT'), ('bad', 'JJ'),
('.', '.')]
:param tokens: Sequence of tokens to be tagged
:type tokens: list(str)
:return: The tagged tokens
:rtype: list(tuple(str, str))
If you want to try to parse the params and "types" by yourself, you could start with a regex. Clearly, though, I am using the word "type" loosely. Moreover, this approach will only work for docstrings that list their parameters and types in this specific way:
>>> import re
>>> params = re.findall(r'(?<=:)type\s+([\w]+):\s*(.*?)(?=\n|$)', getdoc(pos_tag))
>>> for param, type_ in params:
print param, '=>', type_
tokens => list(str)
The results of this approach will of course give you params and their corresponding description. You could also check each word in the description by splitting the string and keeping only those words that meet the following requirement:
>>> isinstance(eval(word), type)
True
>>> isinstance(eval('list'), type)
True
But this approach could quickly get complicated, especially when trying to parse the last parameter of pos_tag. Moreover, docstrings will often not have this format at all. So this would likely only work with the nltk, but even then not all the time.
(4) And lastly, is there a way to know what is the return type?
Again, I'm afraid not, unless you want to use the regex example above to comb through the docstring. The return type might very well vary depending on the arg(s) type(s). (Consider any function that will work with any iterable.) If you want to try to extract this information from a docstring (again, in the exact format of the pos_tag docstring), you can try another regex:
>>> return_ = re.search(r'(?<=:)rtype:\s*(.*?)(?=\n|$)', getdoc(pos_tag))
>>> if return_:
print 'return "type" =', return_.group()
return "type" = rtype: list(tuple(str, str))
Otherwise, the best we can do here is to get the source code (which again, is explicitly what you do not want):
>>> import inspect
>>> print inspect.getsource(pos_tag)
def pos_tag(tokens):
"""
Use NLTK's currently recommended part of speech tagger to
tag the given list of tokens.
>>> from nltk.tag import pos_tag
>>> from nltk.tokenize import word_tokenize
>>> pos_tag(word_tokenize("John's big idea isn't all that bad."))
[('John', 'NNP'), ("'s", 'POS'), ('big', 'JJ'), ('idea', 'NN'), ('is',
'VBZ'), ("n't", 'RB'), ('all', 'DT'), ('that', 'DT'), ('bad', 'JJ'),
('.', '.')]
:param tokens: Sequence of tokens to be tagged
:type tokens: list(str)
:return: The tagged tokens
:rtype: list(tuple(str, str))
"""
tagger = load(_POS_TAGGER)
return tagger.tag(tokens)