I have a var vk_read from Python HTMLParser which handle data like this: ['id168233095']
Now I'm trying to collect all data from this var 'vk_read' after script runs in a list. Should be like: ['id168233095', 'id1682334534', 'id16823453', 'etc...']
if vk_read:
vk_ids = []
for line in vk_read:
if vk_read != '':
vk_ids.append(vk_read)
print(vk_ids)
This is the result:
['id168233095']
['id168233095', 'id168233095']
['id168233095', 'id168233095', 'id168233095']
['id168233095', 'id168233095', 'id168233095', 'id168233095']
['id168233095', 'id168233095', 'id168233095', 'id168233095', 'id168233095']
['id168233095', 'id168233095', 'id168233095', 'id168233095', 'id168233095', 'id168233095']
After some advice code has been changed (see at the end of this post)
if vk_read not in vk_ids:
vk_ids.append(vk_read)
print(vk_ids)
But in this case result is:
['id45849605']
['id91877071']
['id17422363']
['id119899405']
['id65045632']
['id168233095']
That means my vk_read add itself up to 10 times and then my script starts to add the next one.
Also trying list.insert()- and have the same result. (!!!)
How can I run this loop to catch all different result in one list after script runs as many times as the data can be found from the parsed file.
Nota bene:
I've updated the code as advised for list1.append(list0) but in my case this method still return the same result as described above.
And changed list name to avoid further confusions.
LAST UPDATE Thanks for helping, guys, you`re really push me in right way: same on stackoverflow
The problem appears to be that you are reinitializing the list to an empty list in each iteration:
from html.parser import HTMLParser
import re, sys, random, csv
with open('test.html', 'r', encoding='utf-8') as content_file:
read_data = content_file.read()
vk_ids = []
class MyHTMLParser(HTMLParser):
def handle_starttag(self, tag, attrs):
href = str(attrs)
for line in href:
id_tag = re.findall('/\S+$', href)
id_raw = str(id_tag)
if re.search('/\w+\'\)\]', id_raw):
global vk_read
vk_read = id_raw
else:
break
for ch in ['/', ')', '[', ']', '"', "'"]:
if ch in vk_read:
vk_read = vk_read.replace(ch, "")
# https://stackoverflow.com/questions/30328193/python-add-string-to-a-list-loop
for vk_id in vk_read:
if vk_id not in vk_ids:
vk_ids.append(vk_read)
break
print(vk_ids)
break
N.B. After last changes
print(type(vk_ids))
<class 'list'>
for line in vk_read:Why aren't you usinglineinside your for-loop?list, as it shadows an often used builtin.list.insert(0, vk_read)is a very inefficient operation because each time you insert an item all the other items need to be shifted one location to the right. This will become really slow if your list grows large.id_tag= re.findall(...)is wrong. I assume it should all be in thefor line in hrefloop