0

I have a question about possibility of adding some kind of index for repeated set of values. I have a CSV file with geological profiles of few hundreds of boreholes. Each geological layer has it own numerical code e.g. sandstone - 24. Sometimes boreholes repeat oneself. In my result file I need X and Y, bottom layer value, layer thickness and numerical code of layer. If I have two or more layers of the same lithology in my profile/borehole they should have some kind of index ( 24.1 or 24.a; 24.2/24.b...). I couldn't find the way to create this index on Stackoverflow and thats why I'm asking for your help. My code looks like this:

with open('GeoPrz_WYNIKI.csv', newline='') as file:
file = csv.reader(file, delimiter=';', quotechar='|')
measurements = list(file)
transf = []
last_xyz = None
for x, y, glub, idnazw, strop, grub, seria in measurements:
    strop = float(strop)
    grub = float(grub)
    spag = float(format((strop + grub), ".2f"))            
    for line in measurements:
        xyz = x, y, spag
        if xyz == last_xyz:
            continue
        if True:
            last_xyz = xyz
            transf.append([x, y, spag, seria])

Output looks like this:

347591.91   301467.92   19.78   1
347591.91   301467.92   106.06  24
347591.91   301467.92   118.68  25
347591.91   301467.92   120.08  24
347591.91   301467.92   274.3   27

Desired output should look like this:

347591.91   301467.92   19.78   1
347591.91   301467.92   106.06  24
347591.91   301467.92   118.68  25
347591.91   301467.92   120.08  24.1 (or 24a)
347591.91   301467.92   274.3   27

I will be really thankful for your help! Regards, Matsu.

6
  • 1
    Please add the DESIRED output as well. Commented Jul 23, 2017 at 11:42
  • @John Zwinck Thanks for suggestion! Commented Jul 23, 2017 at 11:48
  • 1
    Does your desired output reflect the same input as your actual output? It looks like they are two unrelated data sets. They need to match or it's hard to understand. Commented Jul 23, 2017 at 11:49
  • @John Zwinck They are the same. I just posted only a small part of output file. I have to add some kind of index for repeated X,Y and "seria" (layer number) set. I do not have to use all collumns from input CSV file. Commented Jul 23, 2017 at 11:57
  • Since it seems I was not clear, I have edited the sample output in your question to match the desired output in the way that users of this site expect to see it. Previously the sample output was for more data. Commented Jul 23, 2017 at 12:02

1 Answer 1

1

You can always create a lookup/counter map to be used when you encounter the value again, something like:

with open('GeoPrz_WYNIKI.csv', newline='') as f:
    reader = csv.reader(f, delimiter=';', quotechar='|')
    transf = []  # the output list
    col_map = {}  # lets use a dict for column lookup / counter
    for row in reader:  # loop through the CSV row by row
        col_val = row[-1]  # the value in the last column
        if col_val in transf:  # we already encountered this value
            row[-1] = col_val + "." + transf[col_val]  # append it with the counter
            transf[col_val] += 1  # increase the counter
        else:
            transf[col_val] = 1  # create a counter for the next occurence
        # process the rest of the row if needed here
        # ...
        transf.append(row)  # add the (modified) row to the output list

Of course, you can unpack and process each row as you please, including using a different column than the last - this is just a universal parser that matches the presented data format and will modify the last column value with an increased counter on each repeated occurence.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.