Adding index for a repeated set of values

Question

I have a question about possibility of adding some kind of index for repeated set of values. I have a CSV file with geological profiles of few hundreds of boreholes. Each geological layer has it own numerical code e.g. sandstone - 24. Sometimes boreholes repeat oneself. In my result file I need X and Y, bottom layer value, layer thickness and numerical code of layer. If I have two or more layers of the same lithology in my profile/borehole they should have some kind of index ( 24.1 or 24.a; 24.2/24.b...). I couldn't find the way to create this index on Stackoverflow and thats why I'm asking for your help. My code looks like this:

with open('GeoPrz_WYNIKI.csv', newline='') as file:
file = csv.reader(file, delimiter=';', quotechar='|')
measurements = list(file)
transf = []
last_xyz = None
for x, y, glub, idnazw, strop, grub, seria in measurements:
    strop = float(strop)
    grub = float(grub)
    spag = float(format((strop + grub), ".2f"))            
    for line in measurements:
        xyz = x, y, spag
        if xyz == last_xyz:
            continue
        if True:
            last_xyz = xyz
            transf.append([x, y, spag, seria])

Output looks like this:

347591.91   301467.92   19.78   1
347591.91   301467.92   106.06  24
347591.91   301467.92   118.68  25
347591.91   301467.92   120.08  24
347591.91   301467.92   274.3   27

Desired output should look like this:

347591.91   301467.92   19.78   1
347591.91   301467.92   106.06  24
347591.91   301467.92   118.68  25
347591.91   301467.92   120.08  24.1 (or 24a)
347591.91   301467.92   274.3   27

I will be really thankful for your help! Regards, Matsu.

Does your desired output reflect the same input as your actual output? It looks like they are two unrelated data sets. They need to match or it's hard to understand. — John Zwinck
– John Zwinck, Commented Jul 23, 2017 at 11:49
@John Zwinck They are the same. I just posted only a small part of output file. I have to add some kind of index for repeated X,Y and "seria" (layer number) set. I do not have to use all collumns from input CSV file. — Mateusz Żeruń
– Mateusz Żeruń, Commented Jul 23, 2017 at 11:57
Since it seems I was not clear, I have edited the sample output in your question to match the desired output in the way that users of this site expect to see it. Previously the sample output was for more data. — John Zwinck
– John Zwinck, Commented Jul 23, 2017 at 12:02

zwer · Accepted Answer · 2017-07-23 11:57:53Z

You can always create a lookup/counter map to be used when you encounter the value again, something like:

with open('GeoPrz_WYNIKI.csv', newline='') as f:
    reader = csv.reader(f, delimiter=';', quotechar='|')
    transf = []  # the output list
    col_map = {}  # lets use a dict for column lookup / counter
    for row in reader:  # loop through the CSV row by row
        col_val = row[-1]  # the value in the last column
        if col_val in transf:  # we already encountered this value
            row[-1] = col_val + "." + transf[col_val]  # append it with the counter
            transf[col_val] += 1  # increase the counter
        else:
            transf[col_val] = 1  # create a counter for the next occurence
        # process the rest of the row if needed here
        # ...
        transf.append(row)  # add the (modified) row to the output list

Of course, you can unpack and process each row as you please, including using a different column than the last - this is just a universal parser that matches the presented data format and will modify the last column value with an increased counter on each repeated occurence.

Collectives™ on Stack Overflow

Adding index for a repeated set of values

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related