1

I have a regex function that extracts the numbers before text. But I do it now with hard coded text.

But is it also possible to extract the numbers regardless of the text.

So I have this example string:

text = "[' \n\na)\n\n \n\nFactuur\nVerdi Import Schoolfruit\nFactuur nr. : 71201 Koopliedenweg 33\nDeb. nr. : 108636 2991 LN BARENDRECHT\nYour VAT nr. : NL851703884B01 Nederland\nFactuur datum : 10-12-21\nAantal Omschrijving Prijs Bedrag\nOrder number : 77553 Loading date : 09-12-21 Incoterm: : FOT\nYour ref. : SCHOOLFRUIT Delivery date :\nWK50\nD.C. Schoolfruit\n16 Watermeloenen Quetzali 16kg 4 IMPERIAL BR I € 7,70 € 123,20\n360 Watermeloenen Quetzali 16kg 4 IMPERIAL BR I € 7,70 € 2.772,00\n6 Watermeloenen Quetzali 16kg 4 IMPERIAL BR I € 7,/0 € 46,20\n75  Watermeloenen Quetzali 16kg 4 IMPERIAL BR I € 7,70 € 577,50\n9 Watermeloenen Quetzali 16kg 4 IMPERIAL BR I € 7,70 € 69,30\n688 Appels Royal Gala 13kg 60/65 Generica PL I € 5,07 € 3.488,16\n22  Sinaasappels Valencias 15kg 105 Elara ZAI € 6,25 € 137,50\n80 Sinaasappels Valencias 15kg 105 Elara ZAI € 6,25 € 500,00\n160 Sinaasappels Valencias 15kg 105 FVC ZAI € 6,25 € 1.000,00\n320 Sinaasappels Valencias 15kg 105 Generica ZAI € 6,25 € 2.000,00\n160 Sinaasappels Valencias 15kg 105 Noordhoek ZA I € 6,25 € 1.000,00\n61  Sinaasappels Valencias 15kg 105 Noordhoek ZA I € 6,25 € 381,25\nTotaal Colli Totaal Netto Btw Btw Bedrag Totaal Bedrag\n€ 12.095,11 1.088,56\nBetaling binnen 30 dagen\nAchterstand wordt gemeld bij de kredietverzekeringsmaatschappij\nVerDi Import BV ING Bank NV. Rotterdam IBAN number: NL17INGB0006959173 ~~\n\n \n\nKoopliedenweg 38, 2991 LN Barendrecht, The Netherlands SWIFT/BIC: INGBNL2A, VAT number: NL851703884B01 i\nTel, +31 (0}1 80 61 88 11, Fax +31 (0)1 8061 88 25 Chamber of Commerce Rotterdam no. 55424309 VerDi\n\nE-mail: [email protected], www.verdiimport.nl Dutch law shall apply. The Rotterdam District Court shall have exclusive jurisdiction.\n\nrut ard wegetables\n\x0c']"

and I have this as search words :

fruit_words = ['Appels', 'Ananas', 'Peen Waspeen',
               'Tomaten Cherry', 'Sinaasappels',
               'Watermeloenen', 'Rettich']

and this is the regex expression:

regex =  r"(\d*(?:\.\d+)*)\s*(?:" + '|'.join(re.escape(word)
                                            for word in fruit_words) + ')'

number_found = re.findall(regex, verdi3)
print(number_found)

and the output is then like this:

['16', '360', '6', '75', '9', '688', '22', '80', '160', '320', '160', '61']

My question: Is it also possible to have the same output but then without the fruit_words?

Or mabye without regex?

Thank you

The problem is. If I have this string. Other factuure. But same structure.

text2 = "['A)\n\nFactuur\n\nFactuur nr.\nDeb. nr.\nYour VAT nr.\n\nFactuur datum\n\n72459\n\n108636\nNL851703884B01\n11-01-22\n\nAantal Omschrijving\n\nOrder number\n\nYour ref,\n\nD.C. Schoolfruit\n\n79005 Loading date\nSCHOOLFRUIT Delivery date\nWKO2\n\n782 Peen Breek peen 10xikg B Rabbit NLI\n138 Mandarijnen Clementinas 10kg 3-140 Black MAI\n450 Mandarijnen Clementinas 10kg 3-140 Black MAI\n486 Sinaasappels Navels 15kg 6-90 Gloriosa MAI\n\n60 Sinaasappels Navels 15kg 6-90 Gloriosa MA I\n\nVerDi\nVerDi\nVerDi\n\nTotaal Colli\n\n1.916\n\nVerDi Import BV\n\nKoopliedenweg 38, 2991 LN Barendrecht, The Netherlands\nTel. +31 (0)1 80 61 88 11, Fax +31 (0)1 8061 88 25\nE-mail: [email protected], www.verdiimport.nl\n\nMidden Zuid Noord\nMandarijn 195 158 235\nWortel 202 164 416\nSinaas 302 244 0\n\nTotaal Netto\n\n€ 12.474,40\n\nVerdi Import Schoolfruit\nKoopliedenweg 38\n2991 LN BARENDRECHT\n\nNederland\nPrijs\n\n10-01-22 Incoterm: : FOT\n€ 4,70\n€ 8,00\n€ 8,00\n€ 7,50\n€ 7,50\n\n588\n\n782\n\n546\n\nBtw Btw Bedrag\n\nBedrag\n\n€ 3.675,40\n€ 1.104,00\n€  3.600,00\n€ 3.645,00\n€ 450,00\n\nTotaal Bedrag\n\n€ 1.122,70 € 13.597,10\n\n \n\nBetaling binnen 30 dagen\nAchterstand wordt gemeld bij de kredietverzekeringsmaatschappij\n\nING Bank N.V. Rotterdam IBAN number: NL17INGB0006959173\n\nSWIFT/BIC: INGBNL2A, VAT number: NL851703884B01\nChamber of Commerce Rotterdam no, 55424309\nDutch law shall apply. The Rotterdam District Court shall have exclusive jurisdiction.\n\nSas?\nVerDi\n\nfruit and vegetables\n\x0c']"

Then the output is this:

['72459', '108636', '11-01-22', '79005', '782', '138', '450', '486', '60', '1.916', '2991', '10-01-22', '588', '782', '546']

What of course is wrong. Because I only want the numbers before the fruit sort, so for example:

522 Sinaasappels Navelinas 15kg 

number with . is not included. Like in this string:

text3 = '["a(S (>)\n\n \n\n  \n \n\n \n\n \n\n   \n\nFactuur\nVerdi Import Schoolfruit\nFactuur nr. + 71257 Koopliedenweg 38\nDeb. nr. : 108636 2991 LN BARENDRECHT\nYour VAT nr. : NL851703884B01 Nederland\nFactuur datum : 13-12-21\nAantal Omschrijving Prijs Bedrag\nOrdernumber : 76929 Loading date : 29-11-21 Incoterm: : FOT\nYour ref, : Delivery date :\nD.C. Schoolfruit\n705 Appels Royal Gala 13kq 60/65 Generica PL I € 4,68 € 3.299,40\nOrder number : 76643 Loading date : 25-11-21 Incoterm: : FRA\nYour ref. : Delivery date\nD.C, Schoolfruit\n1.712 Tomaten Cherry pruim 4kg Los Cherie MA I € 2,25 € 3.852,00\n80 Sinaasappels Midnights 15kg 105 BIG 5 ZAI € 6,50 € 520,00\n240 Sinaasappels Midnights 15kg 105 Noordhoek ZAI € 6,50 € 1.560,00\n8 Sinaasappels Valencias 15kg 105 Limpopo ZAI € 6,50 € 52,00\n160 Sinaasappels Valencias 15kg 105 Noordhoek ZAI € 6,50 € 1.040,00\n320 Sinaasappels Valencias 15kg 105 Noordhoek ZAI € 6,50 € 2.080,00\nSINAAS:\nMIDDEN 267 pcm\nNOORD 325 2%; oe\nZUID 216 PARTUNUMMER\nTOTAAL: 808 | DATUN Bi r|\nCHERRY:\nMIDDEN 564\nNOORD 693\nZUID 455\nTOTAAL: 1712 ee\nBETALING\nTotaal Colli Totaal Bedrag\n\n    \n \n \n\n€ 13.519,71\n\n \n \n\nBetaling binnen 30 dagen\nAchterstand wordt gemeld bij de kredietverzekeringsmaatschappij\n\nails,\nVerDi Import BV ING Bank N.V. Rotterdam IBAN number: NL17INGB0006959173 —\nKoopliedenweg 38, 2991 LN Barendrecht, The Netherlands SWIFT/BIC: INGBNL2A, VAT number: NL851703884B01 a\nTel. +31 (0)1 80 61 88 11, Fax +31 (0)1 8061 88 25 Chamber of Commerce Rotterdam no, 55424309 VerDi\n\nE-mail: [email protected], www.verdiimport.nl Dutch law shall apply. The Rotterdam District Court shall have exclusive jurisdiction,\n\nfret and vegetan"]'

I try it with this text:

verdi48 = '["a(S (>)\n\n \n\n  \n \n\n \n\n \n\n   \n\nFactuur\nVerdi Import Schoolfruit\nFactuur nr. + 71257 Koopliedenweg 38\nDeb. nr. : 108636 2991 LN BARENDRECHT\nYour VAT nr. : NL851703884B01 Nederland\nFactuur datum : 13-12-21\nAantal Omschrijving Prijs Bedrag\nOrdernumber : 76929 Loading date : 29-11-21 Incoterm: : FOT\nYour ref, : Delivery date :\nD.C. Schoolfruit\n705 Appels Royal Gala 13kq 60/65 Generica PL I € 4,68 € 3.299,40\nOrder number : 76643 Loading date : 25-11-21 Incoterm: : FRA\nYour ref. : Delivery date\nD.C, Schoolfruit\n1.712 Tomaten Cherry pruim 4kg Los Cherie MA I € 2,25 € 3.852,00\n80 Sinaasappels Midnights 15kg 105 BIG 5 ZAI € 6,50 € 520,00\n240 Sinaasappels Midnights 15kg 105 Noordhoek ZAI € 6,50 € 1.560,00\n8 Sinaasappels Valencias 15kg 105 Limpopo ZAI € 6,50 € 52,00\n160 Sinaasappels Valencias 15kg 105 Noordhoek ZAI € 6,50 € 1.040,00\n320 Sinaasappels Valencias 15kg 105 Noordhoek ZAI € 6,50 € 2.080,00\nSINAAS:\nMIDDEN 267 pcm\nNOORD 325 2%; oe\nZUID 216 PARTUNUMMER\nTOTAAL: 808 | DATUN Bi r|\nCHERRY:\nMIDDEN 564\nNOORD 693\nZUID 455\nTOTAAL: 1712 ee\nBETALING\nTotaal Colli Totaal Bedrag\n\n    \n \n \n\n€ 13.519,71\n\n \n \n\nBetaling binnen 30 dagen\nAchterstand wordt gemeld bij de kredietverzekeringsmaatschappij\n\nails,\nVerDi Import BV ING Bank N.V. Rotterdam IBAN number: NL17INGB0006959173 —\nKoopliedenweg 38, 2991 LN Barendrecht, The Netherlands SWIFT/BIC: INGBNL2A, VAT number: NL851703884B01 a\nTel. +31 (0)1 80 61 88 11, Fax +31 (0)1 8061 88 25 Chamber of Commerce Rotterdam no, 55424309 VerDi\n\nE-mail: [email protected], www.verdiimport.nl Dutch law shall apply. The Rotterdam District Court shall have exclusive jurisdiction,\n\nfret and vegetan"]'

But that doesn't work.

I get 100 times no match.

This is the string:

text4 = '["a(S (>)\n\n \n\n  \n \n\n \n\n \n\n   \n\nFactuur\nVerdi Import Schoolfruit\nFactuur nr. + 71257 Koopliedenweg 38\nDeb. nr. : 108636 2991 LN BARENDRECHT\nYour VAT nr. : NL851703884B01 Nederland\nFactuur datum : 13-12-21\nAantal Omschrijving Prijs Bedrag\nOrdernumber : 76929 Loading date : 29-11-21 Incoterm: : FOT\nYour ref, : Delivery date :\nD.C. Schoolfruit\n705 Appels Royal Gala 13kq 60/65 Generica PL I € 4,68 € 3.299,40\nOrder number : 76643 Loading date : 25-11-21 Incoterm: : FRA\nYour ref. : Delivery date\nD.C, Schoolfruit\n1.712 Tomaten Cherry pruim 4kg Los Cherie MA I € 2,25 € 3.852,00\n80 Sinaasappels Midnights 15kg 105 BIG 5 ZAI € 6,50 € 520,00\n240 Sinaasappels Midnights 15kg 105 Noordhoek ZAI € 6,50 € 1.560,00\n8 Sinaasappels Valencias 15kg 105 Limpopo ZAI € 6,50 € 52,00\n160 Sinaasappels Valencias 15kg 105 Noordhoek ZAI € 6,50 € 1.040,00\n320 Sinaasappels Valencias 15kg 105 Noordhoek ZAI € 6,50 € 2.080,00\nSINAAS:\nMIDDEN 267 pcm\nNOORD 325 2%; oe\nZUID 216 PARTUNUMMER\nTOTAAL: 808 | DATUN Bi r|\nCHERRY:\nMIDDEN 564\nNOORD 693\nZUID 455\nTOTAAL: 1712 ee\nBETALING\nTotaal Colli Totaal Bedrag\n\n    \n \n \n\n€ 13.519,71\n\n \n \n\nBetaling binnen 30 dagen\nAchterstand wordt gemeld bij de kredietverzekeringsmaatschappij\n\nails,\nVerDi Import BV ING Bank N.V. Rotterdam IBAN number: NL17INGB0006959173 —\nKoopliedenweg 38, 2991 LN Barendrecht, The Netherlands SWIFT/BIC: INGBNL2A, VAT number: NL851703884B01 a\nTel. +31 (0)1 80 61 88 11, Fax +31 (0)1 8061 88 25 Chamber of Commerce Rotterdam no, 55424309 VerDi\n\nE-mail: [email protected], www.verdiimport.nl Dutch law shall apply. The Rotterdam District Court shall have exclusive jurisdiction,\n\nfret and vegetan"]'

Then it prints this:

['705', '80', '240', '8', '160', '320']

It is missing the number: 1.712

and If I have this string:

verdi2 = "['A)\n\nFactuur\n\nFactuur nr.\nDeb. nr.\nYour VAT nr.\n\nFactuur datum\n\n72459\n\n108636\nNL851703884B01\n11-01-22\n\nAantal Omschrijving\n\nOrder number\n\nYour ref,\n\nD.C. Schoolfruit\n\n79005 Loading date\nSCHOOLFRUIT Delivery date\nWKO2\n\n782 Peen Breek peen 10xikg B Rabbit NLI\n138 Mandarijnen Clementinas 10kg 3-140 Black MAI\n450 Mandarijnen Clementinas 10kg 3-140 Black MAI\n486 Sinaasappels Navels 15kg 6-90 Gloriosa MAI\n\n60 Sinaasappels Navels 15kg 6-90 Gloriosa MA I\n\nVerDi\nVerDi\nVerDi\n\nTotaal Colli\n\n1.916\n\nVerDi Import BV\n\nKoopliedenweg 38, 2991 LN Barendrecht, The Netherlands\nTel. +31 (0)1 80 61 88 11, Fax +31 (0)1 8061 88 25\nE-mail: [email protected], www.verdiimport.nl\n\nMidden Zuid Noord\nMandarijn 195 158 235\nWortel 202 164 416\nSinaas 302 244 0\n\nTotaal Netto\n\n€ 12.474,40\n\nVerdi Import Schoolfruit\nKoopliedenweg 38\n2991 LN BARENDRECHT\n\nNederland\nPrijs\n\n10-01-22 Incoterm: : FOT\n€ 4,70\n€ 8,00\n€ 8,00\n€ 7,50\n€ 7,50\n\n588\n\n782\n\n546\n\nBtw Btw Bedrag\n\nBedrag\n\n€ 3.675,40\n€ 1.104,00\n€  3.600,00\n€ 3.645,00\n€ 450,00\n\nTotaal Bedrag\n\n€ 1.122,70 € 13.597,10\n\n \n\nBetaling binnen 30 dagen\nAchterstand wordt gemeld bij de kredietverzekeringsmaatschappij\n\nING Bank N.V. Rotterdam IBAN number: NL17INGB0006959173\n\nSWIFT/BIC: INGBNL2A, VAT number: NL851703884B01\nChamber of Commerce Rotterdam no, 55424309\nDutch law shall apply. The Rotterdam District Court shall have exclusive jurisdiction.\n\nSas?\nVerDi\n\nfruit and vegetables\n\x0c']"

Then it retuns also the factuur number. What not has to be.

Only the numbers with fruit after the nunmber has to be returned. So the line with the €.

3 Answers 3

1

One approach without regex. First, we cut the text by \n, because all the numbers we need start on a new line. Then we discard those elements that do not start with a number. Next, we cut the remaining elements by spaces and get numbers.

text = "[' \n\na)\n\n \n\nFactuur\nVerdi Import Schoolfruit\nFactuur nr. : 71201 Koopliedenweg 33\nDeb. nr. : 108636 2991 LN BARENDRECHT\nYour VAT nr. : NL851703884B01 Nederland\nFactuur datum : 10-12-21\nAantal Omschrijving Prijs Bedrag\nOrder number : 77553 Loading date : 09-12-21 Incoterm: : FOT\nYour ref. : SCHOOLFRUIT Delivery date :\nWK50\nD.C. Schoolfruit\n16 Watermeloenen Quetzali 16kg 4 IMPERIAL BR I € 7,70 € 123,20\n360 Watermeloenen Quetzali 16kg 4 IMPERIAL BR I € 7,70 € 2.772,00\n6 Watermeloenen Quetzali 16kg 4 IMPERIAL BR I € 7,/0 € 46,20\n75  Watermeloenen Quetzali 16kg 4 IMPERIAL BR I € 7,70 € 577,50\n9 Watermeloenen Quetzali 16kg 4 IMPERIAL BR I € 7,70 € 69,30\n688 Appels Royal Gala 13kg 60/65 Generica PL I € 5,07 € 3.488,16\n22  Sinaasappels Valencias 15kg 105 Elara ZAI € 6,25 € 137,50\n80 Sinaasappels Valencias 15kg 105 Elara ZAI € 6,25 € 500,00\n160 Sinaasappels Valencias 15kg 105 FVC ZAI € 6,25 € 1.000,00\n320 Sinaasappels Valencias 15kg 105 Generica ZAI € 6,25 € 2.000,00\n160 Sinaasappels Valencias 15kg 105 Noordhoek ZA I € 6,25 € 1.000,00\n61  Sinaasappels Valencias 15kg 105 Noordhoek ZA I € 6,25 € 381,25\nTotaal Colli Totaal Netto Btw Btw Bedrag Totaal Bedrag\n€ 12.095,11 1.088,56\nBetaling binnen 30 dagen\nAchterstand wordt gemeld bij de kredietverzekeringsmaatschappij\nVerDi Import BV ING Bank NV. Rotterdam IBAN number: NL17INGB0006959173 ~~\n\n \n\nKoopliedenweg 38, 2991 LN Barendrecht, The Netherlands SWIFT/BIC: INGBNL2A, VAT number: NL851703884B01 i\nTel, +31 (0}1 80 61 88 11, Fax +31 (0)1 8061 88 25 Chamber of Commerce Rotterdam no. 55424309 VerDi\n\nE-mail: [email protected], www.verdiimport.nl Dutch law shall apply. The Rotterdam District Court shall have exclusive jurisdiction.\n\nrut ard wegetables\n\x0c']"
a = text.split('\n')
b = list(filter(lambda x: x[0].isdigit() if len(x) > 0 else False, a))
c = [x.split()[0] for x in b if x.split()[0].isdigit()]
print(c)
['16', '360', '6', '75', '9', '688', '22', '80', '160', '320', '160', '61']
Sign up to request clarification or add additional context in comments.

3 Comments

I add the check if x.split()[0].isdigit() to remove '11-01-22' and '1.916' - like elements.
I had that solved in the regex. But I think we are almost there. But a number with a . is now not included. For example 1.712 is not included in the list.
#Алексей Р. Almost done. See my updated post
1

You can extract all blocks of text between SCHOOLFRUIT Delivery date and Totaal Colli with (?si)SCHOOLFRUIT(?:(?!SCHOOLFRUIT).)*?Totaal Colli (see this regex), and then extract all numbers at the start of each line using (?m)^\d+ (where ^ matches any line start position and \d+(?:\.\d+)? matches one or more digits and then an optional sequence of a . and one or more digits, i.e. matches ints or floats).

See this Python demo:

import re
texts = [
    "[' \n\na)\n\n \n\nFactuur\nVerdi Import Schoolfruit\nFactuur nr. : 71201 Koopliedenweg 33\nDeb. nr. : 108636 2991 LN BARENDRECHT\nYour VAT nr. : NL851703884B01 Nederland\nFactuur datum : 10-12-21\nAantal Omschrijving Prijs Bedrag\nOrder number : 77553 Loading date : 09-12-21 Incoterm: : FOT\nYour ref. : SCHOOLFRUIT Delivery date :\nWK50\nD.C. Schoolfruit\n16 Watermeloenen Quetzali 16kg 4 IMPERIAL BR I € 7,70 € 123,20\n360 Watermeloenen Quetzali 16kg 4 IMPERIAL BR I € 7,70 € 2.772,00\n6 Watermeloenen Quetzali 16kg 4 IMPERIAL BR I € 7,/0 € 46,20\n75  Watermeloenen Quetzali 16kg 4 IMPERIAL BR I € 7,70 € 577,50\n9 Watermeloenen Quetzali 16kg 4 IMPERIAL BR I € 7,70 € 69,30\n688 Appels Royal Gala 13kg 60/65 Generica PL I € 5,07 € 3.488,16\n22  Sinaasappels Valencias 15kg 105 Elara ZAI € 6,25 € 137,50\n80 Sinaasappels Valencias 15kg 105 Elara ZAI € 6,25 € 500,00\n160 Sinaasappels Valencias 15kg 105 FVC ZAI € 6,25 € 1.000,00\n320 Sinaasappels Valencias 15kg 105 Generica ZAI € 6,25 € 2.000,00\n160 Sinaasappels Valencias 15kg 105 Noordhoek ZA I € 6,25 € 1.000,00\n61  Sinaasappels Valencias 15kg 105 Noordhoek ZA I € 6,25 € 381,25\nTotaal Colli Totaal Netto Btw Btw Bedrag Totaal Bedrag\n€ 12.095,11 1.088,56\nBetaling binnen 30 dagen\nAchterstand wordt gemeld bij de kredietverzekeringsmaatschappij\nVerDi Import BV ING Bank NV. Rotterdam IBAN number: NL17INGB0006959173 ~~\n\n \n\nKoopliedenweg 38, 2991 LN Barendrecht, The Netherlands SWIFT/BIC: INGBNL2A, VAT number: NL851703884B01 i\nTel, +31 (0}1 80 61 88 11, Fax +31 (0)1 8061 88 25 Chamber of Commerce Rotterdam no. 55424309 VerDi\n\nE-mail: [email protected], www.verdiimport.nl Dutch law shall apply. The Rotterdam District Court shall have exclusive jurisdiction.\n\nrut ard wegetables\n\x0c']",
    "['A)\n\nFactuur\n\nFactuur nr.\nDeb. nr.\nYour VAT nr.\n\nFactuur datum\n\n72459\n\n108636\nNL851703884B01\n11-01-22\n\nAantal Omschrijving\n\nOrder number\n\nYour ref,\n\nD.C. Schoolfruit\n\n79005 Loading date\nSCHOOLFRUIT Delivery date\nWKO2\n\n782 Peen Breek peen 10xikg B Rabbit NLI\n138 Mandarijnen Clementinas 10kg 3-140 Black MAI\n450 Mandarijnen Clementinas 10kg 3-140 Black MAI\n486 Sinaasappels Navels 15kg 6-90 Gloriosa MAI\n\n60 Sinaasappels Navels 15kg 6-90 Gloriosa MA I\n\nVerDi\nVerDi\nVerDi\n\nTotaal Colli\n\n1.916\n\nVerDi Import BV\n\nKoopliedenweg 38, 2991 LN Barendrecht, The Netherlands\nTel. +31 (0)1 80 61 88 11, Fax +31 (0)1 8061 88 25\nE-mail: [email protected], www.verdiimport.nl\n\nMidden Zuid Noord\nMandarijn 195 158 235\nWortel 202 164 416\nSinaas 302 244 0\n\nTotaal Netto\n\n€ 12.474,40\n\nVerdi Import Schoolfruit\nKoopliedenweg 38\n2991 LN BARENDRECHT\n\nNederland\nPrijs\n\n10-01-22 Incoterm: : FOT\n€ 4,70\n€ 8,00\n€ 8,00\n€ 7,50\n€ 7,50\n\n588\n\n782\n\n546\n\nBtw Btw Bedrag\n\nBedrag\n\n€ 3.675,40\n€ 1.104,00\n€  3.600,00\n€ 3.645,00\n€ 450,00\n\nTotaal Bedrag\n\n€ 1.122,70 € 13.597,10\n\n \n\nBetaling binnen 30 dagen\nAchterstand wordt gemeld bij de kredietverzekeringsmaatschappij\n\nING Bank N.V. Rotterdam IBAN number: NL17INGB0006959173\n\nSWIFT/BIC: INGBNL2A, VAT number: NL851703884B01\nChamber of Commerce Rotterdam no, 55424309\nDutch law shall apply. The Rotterdam District Court shall have exclusive jurisdiction.\n\nSas?\nVerDi\n\nfruit and vegetables\n\x0c']",
    '["a(S (>)\n\n \n\n  \n \n\n \n\n \n\n   \n\nFactuur\nVerdi Import Schoolfruit\nFactuur nr. + 71257 Koopliedenweg 38\nDeb. nr. : 108636 2991 LN BARENDRECHT\nYour VAT nr. : NL851703884B01 Nederland\nFactuur datum : 13-12-21\nAantal Omschrijving Prijs Bedrag\nOrdernumber : 76929 Loading date : 29-11-21 Incoterm: : FOT\nYour ref, : Delivery date :\nD.C. Schoolfruit\n705 Appels Royal Gala 13kq 60/65 Generica PL I € 4,68 € 3.299,40\nOrder number : 76643 Loading date : 25-11-21 Incoterm: : FRA\nYour ref. : Delivery date\nD.C, Schoolfruit\n1.712 Tomaten Cherry pruim 4kg Los Cherie MA I € 2,25 € 3.852,00\n80 Sinaasappels Midnights 15kg 105 BIG 5 ZAI € 6,50 € 520,00\n240 Sinaasappels Midnights 15kg 105 Noordhoek ZAI € 6,50 € 1.560,00\n8 Sinaasappels Valencias 15kg 105 Limpopo ZAI € 6,50 € 52,00\n160 Sinaasappels Valencias 15kg 105 Noordhoek ZAI € 6,50 € 1.040,00\n320 Sinaasappels Valencias 15kg 105 Noordhoek ZAI € 6,50 € 2.080,00\nSINAAS:\nMIDDEN 267 pcm\nNOORD 325 2%; oe\nZUID 216 PARTUNUMMER\nTOTAAL: 808 | DATUN Bi r|\nCHERRY:\nMIDDEN 564\nNOORD 693\nZUID 455\nTOTAAL: 1712 ee\nBETALING\nTotaal Colli Totaal Bedrag\n\n    \n \n \n\n€ 13.519,71\n\n \n \n\nBetaling binnen 30 dagen\nAchterstand wordt gemeld bij de kredietverzekeringsmaatschappij\n\nails,\nVerDi Import BV ING Bank N.V. Rotterdam IBAN number: NL17INGB0006959173 —\nKoopliedenweg 38, 2991 LN Barendrecht, The Netherlands SWIFT/BIC: INGBNL2A, VAT number: NL851703884B01 a\nTel. +31 (0)1 80 61 88 11, Fax +31 (0)1 8061 88 25 Chamber of Commerce Rotterdam no, 55424309 VerDi\n\nE-mail: [email protected], www.verdiimport.nl Dutch law shall apply. The Rotterdam District Court shall have exclusive jurisdiction,\n\nfret and vegetan"]',
    '["a(S (>)\n\n \n\n  \n \n\n \n\n \n\n   \n\nFactuur\nVerdi Import Schoolfruit\nFactuur nr. + 71257 Koopliedenweg 38\nDeb. nr. : 108636 2991 LN BARENDRECHT\nYour VAT nr. : NL851703884B01 Nederland\nFactuur datum : 13-12-21\nAantal Omschrijving Prijs Bedrag\nOrdernumber : 76929 Loading date : 29-11-21 Incoterm: : FOT\nYour ref, : Delivery date :\nD.C. Schoolfruit\n705 Appels Royal Gala 13kq 60/65 Generica PL I € 4,68 € 3.299,40\nOrder number : 76643 Loading date : 25-11-21 Incoterm: : FRA\nYour ref. : Delivery date\nD.C, Schoolfruit\n1.712 Tomaten Cherry pruim 4kg Los Cherie MA I € 2,25 € 3.852,00\n80 Sinaasappels Midnights 15kg 105 BIG 5 ZAI € 6,50 € 520,00\n240 Sinaasappels Midnights 15kg 105 Noordhoek ZAI € 6,50 € 1.560,00\n8 Sinaasappels Valencias 15kg 105 Limpopo ZAI € 6,50 € 52,00\n160 Sinaasappels Valencias 15kg 105 Noordhoek ZAI € 6,50 € 1.040,00\n320 Sinaasappels Valencias 15kg 105 Noordhoek ZAI € 6,50 € 2.080,00\nSINAAS:\nMIDDEN 267 pcm\nNOORD 325 2%; oe\nZUID 216 PARTUNUMMER\nTOTAAL: 808 | DATUN Bi r|\nCHERRY:\nMIDDEN 564\nNOORD 693\nZUID 455\nTOTAAL: 1712 ee\nBETALING\nTotaal Colli Totaal Bedrag\n\n    \n \n \n\n€ 13.519,71\n\n \n \n\nBetaling binnen 30 dagen\nAchterstand wordt gemeld bij de kredietverzekeringsmaatschappij\n\nails,\nVerDi Import BV ING Bank N.V. Rotterdam IBAN number: NL17INGB0006959173 —\nKoopliedenweg 38, 2991 LN Barendrecht, The Netherlands SWIFT/BIC: INGBNL2A, VAT number: NL851703884B01 a\nTel. +31 (0)1 80 61 88 11, Fax +31 (0)1 8061 88 25 Chamber of Commerce Rotterdam no, 55424309 VerDi\n\nE-mail: [email protected], www.verdiimport.nl Dutch law shall apply. The Rotterdam District Court shall have exclusive jurisdiction,\n\nfret and vegetan"]'
]
for text in texts:
    blocks = re.search(r'SCHOOLFRUIT(?:(?!SCHOOLFRUIT).)*?Totaal Colli', text, re.S|re.I)
    if blocks:
        number_found = re.findall(r'^\d+(?:\.\d+)?', blocks.group(), re.M)
        print(number_found)
    else:
        print("No matches!")

['16', '360', '6', '75', '9', '688', '22', '80', '160', '320', '160', '61']
['782', '138', '450', '486', '60']
['1.712', '80', '240', '8', '160', '320']
['1.712', '80', '240', '8', '160', '320']

Regex explanation

  • (?si)SCHOOLFRUIT(?:(?!SCHOOLFRUIT).)*?Totaal Colli:
    • (?si) - re.I and re.S flags are on (i = re.I to make search case insensitive, s = re.S to make . match line break chars)
    • SCHOOLFRUIT - a literal text
    • (?:(?!SCHOOLFRUIT).)*? - a char, zero or more but as few as possible occurrences, that does not start a SCHOOLFRUIT char sequence
    • Totaal Colli - a literal text
  • (?m)^\d+(?:\.\d+)?:
    • (?m) = re.M - the ^ anchor now matches start of any line, not just a string start position
    • ^ - start of the line
    • \d+ - one or more digits
    • (?:\.\d+)? - an optional non-capturing group matching a . and one or more digits.

13 Comments

#Wiktor Stribiżew see my updated post
@mightycodeNewton See the updated answer. Just added the int/float extraction support.
Oke, But personal I don't like the idea with: (r'SCHOOLFRUIT(?:(?!SCHOOLFRUIT).)*?Totaal Colli and also the else statement
#Wiktor Stribiżew because it prints now also a lot of times No Matches! what ofcourse not has to be returned
@mightycodeNewton What is wrong with the solution? Doesn't it yield the expected results? There are no "no matches" in the strings you provided. The else branch is meant to catch the cases where the matches were not found, that is normal.
|
0

removing Regex as no longer need

text = "[' \n\na)\n\n \n\nFactuur\nVerdi Import Schoolfruit\nFactuur nr. : 71201 Koopliedenweg 33\nDeb. nr. : 108636 2991 LN BARENDRECHT\nYour VAT nr. : NL851703884B01 Nederland\nFactuur datum : 10-12-21\nAantal Omschrijving Prijs Bedrag\nOrder number : 77553 Loading date : 09-12-21 Incoterm: : FOT\nYour ref. : SCHOOLFRUIT Delivery date :\nWK50\nD.C. Schoolfruit\n16 Watermeloenen Quetzali 16kg 4 IMPERIAL BR I € 7,70 € 123,20\n360 Watermeloenen Quetzali 16kg 4 IMPERIAL BR I € 7,70 € 2.772,00\n6 Watermeloenen Quetzali 16kg 4 IMPERIAL BR I € 7,/0 € 46,20\n75  Watermeloenen Quetzali 16kg 4 IMPERIAL BR I € 7,70 € 577,50\n9 Watermeloenen Quetzali 16kg 4 IMPERIAL BR I € 7,70 € 69,30\n688 Appels Royal Gala 13kg 60/65 Generica PL I € 5,07 € 3.488,16\n22  Sinaasappels Valencias 15kg 105 Elara ZAI € 6,25 € 137,50\n80 Sinaasappels Valencias 15kg 105 Elara ZAI € 6,25 € 500,00\n160 Sinaasappels Valencias 15kg 105 FVC ZAI € 6,25 € 1.000,00\n320 Sinaasappels Valencias 15kg 105 Generica ZAI € 6,25 € 2.000,00\n160 Sinaasappels Valencias 15kg 105 Noordhoek ZA I € 6,25 € 1.000,00\n61  Sinaasappels Valencias 15kg 105 Noordhoek ZA I € 6,25 € 381,25\nTotaal Colli Totaal Netto Btw Btw Bedrag Totaal Bedrag\n€ 12.095,11 1.088,56\nBetaling binnen 30 dagen\nAchterstand wordt gemeld bij de kredietverzekeringsmaatschappij\nVerDi Import BV ING Bank NV. Rotterdam IBAN number: NL17INGB0006959173 ~~\n\n \n\nKoopliedenweg 38, 2991 LN Barendrecht, The Netherlands SWIFT/BIC: INGBNL2A, VAT number: NL851703884B01 i\nTel, +31 (0}1 80 61 88 11, Fax +31 (0)1 8061 88 25 Chamber of Commerce Rotterdam no. 55424309 VerDi\n\nE-mail: [email protected], www.verdiimport.nl Dutch law shall apply. The Rotterdam District Court shall have exclusive jurisdiction.\n\nrut ard wegetables\n\x0c']"

Your Input looks like an Invoice and most of the invoices have a start and end before the items are listed. In this case [Not sure, could answer better if I can get one more sample] "D.C. Schoolfruit" will be the start and "Totaal" will be the end. Putting in a loop after splitting it with "\n" and getting rows after the start till end should give you the list of all the items in the invoice. Hope this helps.

1 Comment

I try to use @yourname. But doesn't work. But see also my comment. Because if I try your solution, also factuur number is included. What not has to be

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.