1

For an assignment im trying to combine 4 lists of scraped data into 1. All 4 of them are ordered correctly and shown below.

["Een gezonde samenleving? Het belang van sporten wordt onderschat","Zo vader, zo dochter","Milieuvriendelijk vervoer met waterstof","\"Ik heb zin in wat nog komen gaat\"","Oorlog in Oekraïne"]
["Teamsport","Carsten en Kirsten","Kennisclip","Master Mind","Statement van het CvB"]
["16 maart 2022","10 maart 2022","09 maart 2022","08 maart 2022","07 maart 2022"]
["Directie","Bot","CB","Moniek","Christian"]

My desired output would be like this

[["Een gezonde samenleving? Het belang van sporten wordt onderschat", "Teamsport", "16 maar 2022", "Directie"], [...], [...], [...], [...]]

I've tried some of the solutions found on the internet but i don't understand some of them and most of them are about 2 lists or give errors when i try to implement them.

For more reference, my code looks like this:

urlString :: String
urlString = "https://www.example.com"

--Main function in which we call the other functions
main :: IO()
main = do
    resultTitle <- scrapeURL urlString scrapeHANTitle
    resultSubtitle <- scrapeURL urlString scrapeHANSubtitle
    resultDate <- scrapeURL urlString scrapeHANDate
    resultAuthor <- scrapeURL urlString scrapeHANAuthor
    print resultTitle
    print resultSubtitle
    print resultDate
    print resultAuthor

scrapeHANTitle :: Scraper String [String]
scrapeHANTitle =
    chroots ("div" @: [hasClass "card-news__body"]) scrapeTitle

scrapeHANSubtitle :: Scraper String [String]
scrapeHANSubtitle =
    chroots ("div" @: [hasClass "card-news__body"]) scrapeSubTitle

scrapeHANDate :: Scraper String [String]
scrapeHANDate = 
    chroots ("div" @: [hasClass "card-article__meta__body"]) scrapeDate

scrapeHANAuthor :: Scraper String [String]
scrapeHANAuthor =
    chroots ("div" @: [hasClass "card-article__meta__body"]) scrapeAuthor

-- gets the title of news items
-- https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8192&number=128&utf8=dec
-- some titles contain special characters so use this utf8 table to add conversion
scrapeTitle :: Scraper String String
scrapeTitle = do
    text $ "a" @: [hasClass "card-news__body__title"]

-- gets the subtitle of news items
scrapeSubTitle :: Scraper String String
scrapeSubTitle = do
    text $ "span" @: [hasClass "card-news__body__eyebrow"]

--gets the date on which the news item was posted
scrapeDate :: Scraper String String 
scrapeDate = do
    text $ "div" @: [hasClass "card-news__footer__body__date"]

--gets the author of the news item
scrapeAuthor :: Scraper String String 
scrapeAuthor = do
    text $ "div" @: [hasClass "card-news__footer__body__author"]

I also tried the following below but it gave me a bunch of type errors.

mergeLists :: Maybe [String] -> Maybe [String] ->Maybe [String] -> Maybe [String] -> Maybe [String]
mergeLists = \s1 -> \s2 -> \s3 -> \s4 ->s1 ++ s2 ++ s3 ++ s4
1
  • Why are there four empty lists at the end? Commented Mar 23, 2022 at 20:00

2 Answers 2

1

You can make use of the Monoid instance and work with:

mergeLists :: Maybe [String] -> Maybe [String] ->Maybe [String] -> Maybe [String] -> Maybe [String]
mergeLists s1 s2 s3 s4 = s1 <> s2 <> s3 <> s4

Here you are however scraping the same page, so you can combine the data from the scraper with:

myScraper :: Scraper String [String]
myScraper = do
    da <- scrapeHANTitle
    db <- scrapeHANSubtitle
    dc <- scrapeHANDate
    dd <- scrapeHANAuthor
    return da ++ db ++ dc ++ dd

and then run this with:

main :: IO()
main = do
    result <- scrapeURL urlString myScraper
    print result

or shorter:

main :: IO()
main = scrapeURL urlString myScraper >>= print
Sign up to request clarification or add additional context in comments.

4 Comments

I don't know if i'm stupid or not but when i try to apply your solution like this in the do loop, it only prints s1. i call it print (mergeLists resultSubtitle resultTitle resultDate resultAuthor)
@CentMeister: yes, I made a mistake. See edit.
I have it semi-working right now. I changed return (da ++ db ++ dc ++ dd) Now however i get a list of [da, db, dc, dd] while my intention was [[da(first element), db(first element), dc(first element), dd(first element)], [da(second element), db(second element), dc(second element), dd(second element)], [...], [...]]
@CentMeister: then you use return (transpose [da, db, dc, dd]). (with transpose from Data.List).
0

You can combine four lists using zip4 from Data.List.

import Data.List

list1 = ["Een gezonde samenleving? Het belang van sporten wordt onderschat","Zo vader, zo dochter","Milieuvriendelijk vervoer met waterstof","\"Ik heb zin in wat nog komen gaat\"","Oorlog in Oekraïne"]
list2 = ["Teamsport","Carsten en Kirsten","Kennisclip","Master Mind","Statement van het CvB"]
list3 = ["16 maart 2022","10 maart 2022","09 maart 2022","08 maart 2022","07 maart 2022"]
list4 = ["Directie","Bot","CB","Moniek","Christian"]

result = zip4 list1 list2 list3 list4

result2 = [[x1,x2,x3,x4] | (x1,x2,x3,x4) <- zip4 list1 list2 list3 list4]

The two results differ slightly. Result result creates a list of tuples. Result result2 creates a list of lists, as requested. A list of tuples is probably better, because:

  • The list can contain any number of values, all of the same type (Haskell lists are homogenous)
  • Tuples can contain any types, so more flexibility
  • Tuples with two values are a different type than tuples with three values, so if you want collections of four values using tuples stops the user squeezing in a collection of three values or five values

2 Comments

Thank you for your reply. In my case the lists are of equal length so that is not a problem. However is it right to assume that if i want to specify Types like type Title = String and use those for the scraping, that a list of tuples could be used?
If you use zip, or a variant, the algorithm will run until one list runs out. Also, zip will take any types. :t zip4 -- zip4 :: [a] -> [b] -> [c] -> [d] -> [(a, b, c, d)] where a, b, c, d are types.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.