1

I have a var vk_read from Python HTMLParser which handle data like this: ['id168233095']

Now I'm trying to collect all data from this var 'vk_read' after script runs in a list. Should be like: ['id168233095', 'id1682334534', 'id16823453', 'etc...']

if vk_read:
    vk_ids = []
    for line in vk_read:
        if vk_read != '':
            vk_ids.append(vk_read)
            print(vk_ids)

This is the result:

['id168233095']
['id168233095', 'id168233095']
['id168233095', 'id168233095', 'id168233095']
['id168233095', 'id168233095', 'id168233095', 'id168233095']
['id168233095', 'id168233095', 'id168233095', 'id168233095', 'id168233095']
['id168233095', 'id168233095', 'id168233095', 'id168233095', 'id168233095', 'id168233095']

After some advice code has been changed (see at the end of this post)

if vk_read not in vk_ids:
    vk_ids.append(vk_read)
print(vk_ids)

But in this case result is:

['id45849605']
['id91877071']
['id17422363']
['id119899405']
['id65045632']
['id168233095']

That means my vk_read add itself up to 10 times and then my script starts to add the next one.

Also trying list.insert()- and have the same result. (!!!)

How can I run this loop to catch all different result in one list after script runs as many times as the data can be found from the parsed file.

Nota bene: I've updated the code as advised for list1.append(list0) but in my case this method still return the same result as described above. And changed list name to avoid further confusions.

LAST UPDATE Thanks for helping, guys, you`re really push me in right way: same on stackoverflow

The problem appears to be that you are reinitializing the list to an empty list in each iteration:

from html.parser import HTMLParser
import re, sys, random, csv

with open('test.html', 'r', encoding='utf-8') as content_file:
    read_data = content_file.read()

vk_ids = []

class MyHTMLParser(HTMLParser):

    def handle_starttag(self, tag, attrs):
        href = str(attrs)
        for line in href:
            id_tag = re.findall('/\S+$', href)
            id_raw = str(id_tag)

            if re.search('/\w+\'\)\]', id_raw):
                global vk_read
                vk_read = id_raw
            else:
                break
            for ch in ['/', ')', '[', ']', '"', "'"]:
                if ch in vk_read:

                    vk_read = vk_read.replace(ch, "")

            # https://stackoverflow.com/questions/30328193/python-add-string-to-a-list-loop
            for vk_id in vk_read:
                if vk_id not in vk_ids:
                    vk_ids.append(vk_read)
                    break
            print(vk_ids)
            break

N.B. After last changes

print(type(vk_ids))
<class 'list'>
5
  • 2
    for line in vk_read: Why aren't you using line inside your for-loop? Commented May 19, 2015 at 14:17
  • 4
    It's probably a good idea not to name a variable list, as it shadows an often used builtin. Commented May 19, 2015 at 14:17
  • list.insert(0, vk_read) is a very inefficient operation because each time you insert an item all the other items need to be shifted one location to the right. This will become really slow if your list grows large. Commented May 19, 2015 at 14:19
  • @trianglesis is it your actual indentation? if so, everything after id_tag= re.findall(...) is wrong. I assume it should all be in the for line in href loop Commented May 19, 2015 at 16:08
  • @JulienSpronck I`ve made some changes already Commented May 19, 2015 at 16:21

4 Answers 4

3

how about:

vk_ids = []
if vk_read:
    for line in vk_read:
        vk_ids.append(format(line))
    print(vk_ids)
Sign up to request clarification or add additional context in comments.

Comments

0

It appears that you are inside a loop, vk_read is a string that changes at each iteration:

vk_ids = [] ## initialize list outside the main loop

## main loop
for some_variable in some_kind_of_iterator: ## this is just a placeholder, i don't know what your loop looks like.

    ## get the value for vk_read
    vk_read = ...

    ## append to vk_ids
    if vk_read and vk_read not in vk_ids:
        vk_ids.append(vk_read)

print vk_ids

1 Comment

Trying different constructions and also trying to make readable code, but now I have: vk_ids.append(vk_read) is <class 'list'> but the list still does not collect different data from variable. Something I`ve lost.
0

In your code, you were not making use of the line variable inside the loop. At each iteration, you are inserting the entire vk_read variable.

Assuming that vk_read is a list, you can use a list comprehension:

lis = [line for line in vk_read if line != '']
print lis

If you need it reversed (as seems to be the case by your use of insert, just use reversed:

lis = list(reversed([line for line in vk_read if line != '']))

However, vk_read seems to be a string not a list.

13 Comments

His example code is actually equivalent to lis = reversed([vk_read for line in vk_read if vk_read != '']). The if vkread != '' can be skipped as the loop wouldn't happen if vk_read were equal to an empty string. reversed is used because OP is using list.insert(0, vk_read). The most efficient equivalent would be lis = len(vk_read) * [vk_read] (reversed doesn't really matter because we're just inserting vk_read, not vk_read's ordered contents.)
I just assumed that it was a mistake since line is not referenced inside the loop
While this may or may not be right, you should try to establish the problem the OP ran in to and why this can help solve the problem.
@FrankV let me know if my last edit is more helpful.
if vk_read: vk_ids = [line for line in vk_read if line != ''] print(vk_ids) And still have the same, but separately for every character: ['i', 'd', '1', '6', '8', '2', '3', '3', '0', '9', '5']
|
0

My bad, I've doing it wrong and run iteration and list append all time wiping prev list. Here is comment about it

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.