1

I am writing a simple Flask API. The API gets an npm package_name and returns all its dependencies. In a case that one of its dependencies has its own dependencies, the API should add them too, but in a nested way.

I should return JSON content like this:


[{'package_name': str, 'version': str, 'dependencies': []},
 {'package_name': str, 'version': str, 'dependencies': [{'package_name': str, 'version': str, 'dependencies': []}]},
 {'package_name': str, 'version': str, 'dependencies': []},
  ....]
#

I wrote it in as a recursion algorithm (it works). However, it takes a long runtime, so I want to convert it into an iteration algorithm. I tried a few things, but without success, because I don't keep the nested form.

Here is my recursion algorithm:

import requests as requests
from flask import Flask, jsonify

app = Flask(__name__)


@app.route('/<string:package_name>', methods=['GET'])
def hello_world(package_name):
    response = requests.get('https://registry.npmjs.org/{package_name}/latest'.format(package_name=package_name)).json()
    dependencies = dict(response['dependencies'])
    return jsonify(get_dep(dependencies))


# {'package_name': str, 'version': str, 'dependencies': []}
def get_dep(dependencies: dict):
    to_return = []
    for key in dependencies.keys():
        response = requests.get('https://registry.npmjs.org/{package_name}/{version}'.
                                format(package_name=key, version=dependencies[key])).json()
        if 'dependencies' in response.keys():
            deps = dict(response['dependencies'])
            to_return.append({'package_name': key, 'version': dependencies[key], 'dependencies': get_dep(deps)})
        else:
            to_return.append({'package_name': key, 'version': dependencies[key], 'dependencies': []})
    return to_return


if __name__ == '__main__':
    app.run()

What are some ideas how to achieve that, while keeping the nested form?

4
  • If I understand, you want to convert recursion to iteration because recursion is slower? As per requirement recursion seems to be a perfect solution as you don't know before hand the number of levels you need to go into Commented Aug 10, 2021 at 6:36
  • Yes, exactly! the recursion solution is easier and it more intuitive but also much slower. Commented Aug 10, 2021 at 6:56
  • I am not entirely sure, but I don't think an iterative approach would be faster as is. Since you'll need extra conditions to check if there are more siblings to traverse. But you can potentially thread it out in both ways to increase speed. With Flask threading is not straight forward I suppose. Commented Aug 10, 2021 at 7:00
  • you should be looking at Depth First Search or Breadth First Search concepts with iteration. In your case, you won't be breaking in case of a search Commented Aug 10, 2021 at 7:14

2 Answers 2

2

Taking an object-oriented approach. The below solution works in an iterative manner. But it does not increase the efficiency as expected.

Your recursive code runs in 19.2534s for express package and my code runs in 20.4696s for the same package.

But well it's an iterative method. I maintain a list of all Package objects that need to be processed. And extend the list whenever new dependencies are found. And since by default objects are moved around by reference, updating a dependencies will update it right up until the parent. Finally you can convert the main_package to a JSON of your choice

def get_dependency(package, version):
    response = requests.get('https://registry.npmjs.org/{package_name}/{version}'.
                            format(package_name=package, version=version)).json()
    return response.get('dependencies', [])


class Package:

    def __init__(self, name, version):
        self.name = name
        self.version = version
        self.dependencies = []

    def __set_dependencies(self, deps: Dict[str, str]):
        for dep in deps:
            self.dependencies.append(Package(dep, deps[dep]))

    def get_dependencies(self):
        return self.dependencies

    def fetch_dependencies(self):
        resp = get_dependency(self.name, self.version)
        self.__set_dependencies(resp)


package_name = 'express'
package_version = 'latest'
main_package = Package(package_name, package_version)

dependencies_to_process = [main_package]

while len(dependencies_to_process) > 0:
    package = dependencies_to_process.pop()
    package.fetch_dependencies()
    dependencies_to_process.extend(package.get_dependencies())

# use main_package

Note: You can potentially do this using dictionaries too as they will maintain references, I come from the Java land hence inclined to classes.


EDIT

You can increase efficiency slightly by keeping a temporary cache of fetched Packages. If the same version of a dependency is being fetched, take it from a cache. It saved half a second in my test. Here's a sample code, you may need to save it in a persistent cache instead of package_repo dictionary.

This would have a significant benefit if many high-level dependencies have common children. Then the entire branch would be fetched only once, and since its a reference it still works.


def get_dependency(package, version):
    response = requests.get('https://registry.npmjs.org/{package_name}/{version}'.
                            format(package_name=package, version=version)).json()
    return response.get('dependencies', [])


class Package:
    package_repo = {}

    @staticmethod
    def create_package(name, version):
        key = str(name) + '|' + str(version)
        if key in Package.package_repo:
            pack = Package.package_repo[key]
        else:
            pack = Package(name, version)
            Package.package_repo[key] = pack
        return pack

    def __init__(self, name, version):
        self.name = name
        self.version = version
        self.dependencies = []
        self.__processed = False

    def __set_dependencies(self, deps: Dict[str, str]):
        for dep in deps:
            self.dependencies.append(Package.create_package(dep, deps[dep]))

    def get_dependencies(self):
        return self.dependencies

    def fetch_dependencies(self):
        if not self.__processed:
            resp = get_dependency(self.name, self.version)
            self.__set_dependencies(resp)
            self.__processed = True


package_name = 'express'
package_version = 'latest'
main_package = Package.create_package(package_name, package_version)
dependencies_to_process = [main_package]

while len(dependencies_to_process) > 0:
    package = dependencies_to_process.pop()
    package.fetch_dependencies()
    dependencies_to_process.extend(package.get_dependencies())

EDIT

There was a bug in the cache code, after a fix, there is a significant improvement in runtime as compared to recursive code.

Runtime in seconds (lower is better) - Fetching express:latest

Recursive Iterative Iterative(cache)
19.78 21.01 9.95
18.11 18.28 7.56
16.45 16.92 7.37
17.11 17.46 7.9
17.16 18.6 11.02
18.85 16.64 7.56
16.5 16.55 7.57
16.89 16.82 7.68
16.4 17.09 7.68
16.76 17.61 7.53

Edit: With threading

I wanted to test it with threading, and well it works in less than a second. I hope it helps. Waiting for the threads to complete is a bit hacking, there should be a more elegant solution

import json
import threading
import time
from typing import Dict

import requests


def get_dependency(package: str, version: str):
    response = requests.get('https://registry.npmjs.org/{package_name}/{version}'.
                            format(package_name=package, version=version)).json()
    return response.get('dependencies', [])


all_threads = []


class Package:
    package_repo = {}
    lock = threading.Lock()

    @staticmethod
    def create_package(name: str, version: str):
        key = str(name) + '|' + str(version)
        with Package.lock:
            if key in Package.package_repo:
                pack = Package.package_repo[key]
            else:
                pack = Package(name, version)
                Package.package_repo[key] = pack
        return pack

    def __init__(self, name: str, version: str):
        self.name = name
        self.version = version
        self.dependencies = []
        self.__processed = False
        t = threading.Thread(target=self.fetch_dependencies)
        t.start()
        all_threads.append(t)

    def __set_dependencies(self, deps: Dict[str, str]):
        for dep in deps:
            self.dependencies.append(Package.create_package(dep, deps[dep]))

    def get_dependencies(self):
        return self.dependencies

    def fetch_dependencies(self):
        if not self.__processed:
            resp = get_dependency(self.name, self.version)
            self.__set_dependencies(resp)
            self.__processed = True

    def to_dict(self):
        deps = [d.to_dict() for d in self.dependencies]
        return {'package': self.name, 'version': self.version, 'dependencies': deps}


start = time.time()
package_name = 'express'
package_version = 'latest'
main_package = Package.create_package(package_name, package_version)

while threading.active_count() > 1:
    for t in all_threads:
        t.join(1)

end_time = time.time()

with open('op.json', 'w+') as op:
    json.dump(main_package.to_dict(), op)
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks! really nice solution. surprisingly it is not faster, but I can use that conceptually.
@DeanTaler I'm glad it helps, check the edit it might increase efficiency some more in context of Flask using cache
@DeanTaler I was bored and wanted to try it out, I have a threading sample for you, and it does the job in less than a second. Hope it helps
I haven't used threading in python yet (only in java) so it is a great opportunity to start. looks AMAZING! @shoaib30
0

You can create a dict and need a mechanism to traverse to the parent and sibling elements.

Run a kind of infinite loop and get dependencies put in the dict and find until you hit a wall and go up and try with the siblings. Try until you can do it for all dependencies.

You might want to look at a tree implementation.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.