Taking an object-oriented approach. The below solution works in an iterative manner. But it does not increase the efficiency as expected.
Your recursive code runs in 19.2534s for express package
and my code runs in 20.4696s for the same package.
But well it's an iterative method. I maintain a list of all Package objects that need to be processed. And extend the list whenever new dependencies are found. And since by default objects are moved around by reference, updating a dependencies will update it right up until the parent. Finally you can convert the main_package to a JSON of your choice
def get_dependency(package, version):
response = requests.get('https://registry.npmjs.org/{package_name}/{version}'.
format(package_name=package, version=version)).json()
return response.get('dependencies', [])
class Package:
def __init__(self, name, version):
self.name = name
self.version = version
self.dependencies = []
def __set_dependencies(self, deps: Dict[str, str]):
for dep in deps:
self.dependencies.append(Package(dep, deps[dep]))
def get_dependencies(self):
return self.dependencies
def fetch_dependencies(self):
resp = get_dependency(self.name, self.version)
self.__set_dependencies(resp)
package_name = 'express'
package_version = 'latest'
main_package = Package(package_name, package_version)
dependencies_to_process = [main_package]
while len(dependencies_to_process) > 0:
package = dependencies_to_process.pop()
package.fetch_dependencies()
dependencies_to_process.extend(package.get_dependencies())
# use main_package
Note: You can potentially do this using dictionaries too as they will maintain references, I come from the Java land hence inclined to classes.
EDIT
You can increase efficiency slightly by keeping a temporary cache of fetched Packages. If the same version of a dependency is being fetched, take it from a cache. It saved half a second in my test. Here's a sample code, you may need to save it in a persistent cache instead of package_repo dictionary.
This would have a significant benefit if many high-level dependencies have common children. Then the entire branch would be fetched only once, and since its a reference it still works.
def get_dependency(package, version):
response = requests.get('https://registry.npmjs.org/{package_name}/{version}'.
format(package_name=package, version=version)).json()
return response.get('dependencies', [])
class Package:
package_repo = {}
@staticmethod
def create_package(name, version):
key = str(name) + '|' + str(version)
if key in Package.package_repo:
pack = Package.package_repo[key]
else:
pack = Package(name, version)
Package.package_repo[key] = pack
return pack
def __init__(self, name, version):
self.name = name
self.version = version
self.dependencies = []
self.__processed = False
def __set_dependencies(self, deps: Dict[str, str]):
for dep in deps:
self.dependencies.append(Package.create_package(dep, deps[dep]))
def get_dependencies(self):
return self.dependencies
def fetch_dependencies(self):
if not self.__processed:
resp = get_dependency(self.name, self.version)
self.__set_dependencies(resp)
self.__processed = True
package_name = 'express'
package_version = 'latest'
main_package = Package.create_package(package_name, package_version)
dependencies_to_process = [main_package]
while len(dependencies_to_process) > 0:
package = dependencies_to_process.pop()
package.fetch_dependencies()
dependencies_to_process.extend(package.get_dependencies())
EDIT
There was a bug in the cache code, after a fix, there is a significant improvement in runtime as compared to recursive code.
Runtime in seconds (lower is better) - Fetching express:latest
| Recursive |
Iterative |
Iterative(cache) |
| 19.78 |
21.01 |
9.95 |
| 18.11 |
18.28 |
7.56 |
| 16.45 |
16.92 |
7.37 |
| 17.11 |
17.46 |
7.9 |
| 17.16 |
18.6 |
11.02 |
| 18.85 |
16.64 |
7.56 |
| 16.5 |
16.55 |
7.57 |
| 16.89 |
16.82 |
7.68 |
| 16.4 |
17.09 |
7.68 |
| 16.76 |
17.61 |
7.53 |
Edit: With threading
I wanted to test it with threading, and well it works in less than a second. I hope it helps. Waiting for the threads to complete is a bit hacking, there should be a more elegant solution
import json
import threading
import time
from typing import Dict
import requests
def get_dependency(package: str, version: str):
response = requests.get('https://registry.npmjs.org/{package_name}/{version}'.
format(package_name=package, version=version)).json()
return response.get('dependencies', [])
all_threads = []
class Package:
package_repo = {}
lock = threading.Lock()
@staticmethod
def create_package(name: str, version: str):
key = str(name) + '|' + str(version)
with Package.lock:
if key in Package.package_repo:
pack = Package.package_repo[key]
else:
pack = Package(name, version)
Package.package_repo[key] = pack
return pack
def __init__(self, name: str, version: str):
self.name = name
self.version = version
self.dependencies = []
self.__processed = False
t = threading.Thread(target=self.fetch_dependencies)
t.start()
all_threads.append(t)
def __set_dependencies(self, deps: Dict[str, str]):
for dep in deps:
self.dependencies.append(Package.create_package(dep, deps[dep]))
def get_dependencies(self):
return self.dependencies
def fetch_dependencies(self):
if not self.__processed:
resp = get_dependency(self.name, self.version)
self.__set_dependencies(resp)
self.__processed = True
def to_dict(self):
deps = [d.to_dict() for d in self.dependencies]
return {'package': self.name, 'version': self.version, 'dependencies': deps}
start = time.time()
package_name = 'express'
package_version = 'latest'
main_package = Package.create_package(package_name, package_version)
while threading.active_count() > 1:
for t in all_threads:
t.join(1)
end_time = time.time()
with open('op.json', 'w+') as op:
json.dump(main_package.to_dict(), op)