1

I have a list of patterns:

    patterns_trees = [response.css("#Header").xpath("//a/img/@src"), 
                      response.css("#HEADER").xpath("//a/img/@src"),
                      response.xpath("//header//a/img/@src"),
                      response.xpath("//a[@href='"+response.url+'/'+"']/img/@src"),
                      response.xpath("//a[@href='/']/img/@src")
                      ]

After I traverse it and find the right pattern I have to send the pattern as an argument to a callback function

for pattern_tree in patterns_trees:
...
    pattern_response = scrapy.Request(...,..., meta={"pattern_tree": pattern_tree.extract_first()})

By doing this I get the value of the regex not the pattern

THINGS I TRIED:

I tried isolating the patterns in a separate class but still I have the problem that I can not store them as pattern but as values.

I tried to save them as strings and maybe I can make it work but

What is the most efficient way of storing list of functions

UPDATE: Possible solution but too hardcoded and it's too problematic when I want to add more patterns:

def patter_0(response):
    response.css("#Header").xpath("//a/img/@src")    
def patter_1(response):
    response.css("#HEADER").xpath("//a/img/@src")
.....
class patternTrees:
    patterns = [patter_0,...,patter_n]

    def length_patterns(self):
        return len(patterns)
7
  • Should a specific pattern be related to a specific function? Commented Jan 19, 2018 at 10:21
  • In meta you're sending .extract_first(), the brackets cause it to execute. Try sending .extract_first instead (without brackets) to send the actual function. Commented Jan 19, 2018 at 10:21
  • @Magnus yes I want to have a specific function for each pattern, so that I can send arguments to it Commented Jan 19, 2018 at 10:28
  • @Swier no, that is not the problem, with or without that function pattern_tree is already a computed result Commented Jan 19, 2018 at 10:29
  • You can use the object response is an instance of to access its methods. For strings you can do this: [str.upper, str.lower][0]('test') -> 'TEST' Commented Jan 19, 2018 at 10:50

2 Answers 2

2

If you're willing to consider reformatting your list of operations, then this is a somewhat neat solution. I've changed the list of operations to a list of tuples. Each tuple contains (a ref to) the appropriate function, and another tuple consisting of arguments.

It's fairly easy to add new operations to the list: just specify what function to use, and the appropriate arguments.

If you want to use the result from one operation as an argument in the next: You will have to return the value from execute() and process it in the for loop.

I've replaced the calls to response with prints() so that you can test it easily.

def response_css_ARG_xpath_ARG(args):
    return "response.css(\"%s\").xpath(\"%s\")" % (args[0],args[1])
    #return response.css(args[0]).xpath(args[1])

def response_xpath_ARG(arg):
    return "return respons.xpath(\"%s\")" % (arg)
    #return response.xpath(arg)

def execute(function, args):
    response = function(args)
    # do whatever with response
    return response 

response_url = "https://whatever.com"


patterns_trees = [(response_css_ARG_xpath_ARG, ("#Header", "//a/img/@src")), 
                  (response_css_ARG_xpath_ARG, ("#HEADER", "//a/img/@src")),
                  (response_xpath_ARG, ("//header//a/img/@src")),
                  (response_xpath_ARG, ("//a[@href='"+response_url+"/"+"']/img/@src")),
                  (response_xpath_ARG, ("//a[@href='/']/img/@src"))]

for pattern_tree in patterns_trees:
    print(execute(pattern_tree[0], pattern_tree[1]))

Note that execute() can be omitted! Depending on if you need to process the result or not. Without the executioner, you may just call the function directly from the loop:

for pattern_tree in patterns_trees:
    print(pattern_tree[0](pattern_tree[1]))
Sign up to request clarification or add additional context in comments.

1 Comment

maybe that is the most clever way to do it
0

Not sure I understand what you're trying to do, but could you make your list a list of lambda functions like so:

patterns_trees = [
    lambda response : response.css("#Header").xpath("//a/img/@src"),
    ...
]

And then, in your loop:

for pattern_tree in patterns_trees:
    intermediate_response = scrapy.Request(...)  # without meta kwarg
    pattern_response = pattern_tree(intermediate_response)

Or does leaving the meta away have an impact on the response object?

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.