I'm trying to figure out a good practice for designing a function with (many) optional components.
For a specific example, say I am interested in designing a feature extractor function that takes as input a document and returns a list of features extracted from the document.
Question
If there are many optional components, what kind of approach would be considered good practice and scalable?
Below are a couple options I have been able to think of, though there may be other approaches that I have not considered.
Approach 1: class based
class FeatureExtractor(object):
"""Extract features from text for use in classification."""
def __init__(self, term_frequency=False, consider_negation=False,
pos_tags=False):
self.term_frequency = term_frequency
self.consider_negation = consider_negation
self.pos_tags = pos_tags
# Could be many more ...
def extract(self, document):
"""Extract features from a document."""
features = []
if self.term_frequency:
features.extend(self.extract_term_frequency(document))
if self.consider_negation:
features.extend(self.extract_negation(document))
if self.pos_tags:
features.extend(self.extract_pos_tags(document))
return features
def extract_term_frequency(self, document):
pass
def extract_negation(self, document):
pass
def extract_pos_tags(self, document):
pass
extractor = FeatureExtractor(term_frequency=True, consider_negation=True,
pos_tags=True)
extractor.extract(document)
Approach 2: function arguments
def extract(document, *functions):
"""Extract features from a document."""
features = []
for function in functions:
features.extend(function(document))
return features
def extract_term_frequency(document):
pass
def extract_negation(document):
pass
def extract_pos_tags(document):
pass
extract(document, extract_term_frequency, extract_negation, extract_pos_tags)
Approach 3: class with mixins or multiple inheritance
Something of a combination of the first and second approach, though I'm not sure how this would be done.
Any ideas on a direction to head would be greatly appreciated!