1

I have searched in stackoverflow but haven't found any answer.

I have written a script in python to get data from this website.

https://resources.allsetlearning.com/chinese/grammar/Reduplication_of_adjectives

The page have two-three sentence structure and 4-5 example. For ex:-

Structure 1
- Example 1
- Example 2

Structure 2
- Example 1
- Example 2

Structure 3
- Example 1
- Example 2
- Example 3

I managed to get all sentence structure and example sentence but how can I get example sentence for structure 1 , structure 2 , structure 3 separately. Also how not to get wrong sentences.

from selenium import webdriver
import time

driver = webdriver.Chrome(r"C:\Users\<user>\Documents\chromedriver\chromedriver.exe") # change it

save_file = open("export.txt", "w", encoding="utf8")
wrong_link_file = open("link_with_wrong.txt", "w", encoding="utf8")

url = "https://resources.allsetlearning.com/chinese/grammar/Reduplication_of_adjectives"

time.sleep(1)

driver.get(url)

time.sleep(3)

#jiegou = driver.find_element_by_xpath("/html/body/section/div[3]/div[4]/div[2]/div/div/div[2]/h1")

jiegou = driver.find_elements_by_class_name("jiegou")

usedfor = driver.find_element_by_xpath("//*[@id='ibox']/ul/li[6]/div[2]")

heading = driver.find_element_by_xpath("//*[@id='innerbodycontent']/div/div[2]/h1")

sen = driver.find_elements_by_class_name("spaced")

wrong = driver.find_elements_by_class_name("x")


# if page contain wrong sentence 
found = False
if len(wrong) > 0:
        found = True
        print("..............Found..............." + url)


for j in jiegou:
        jiegou_str = ":: " + j.text + " ::"
        print(jiegou_str)
        save_file.write(jiegou_str)
        print("\n.........................................................\n")

        save_file.write("\n\n")

st_sen=""
for s in sen:
        st_sen = str(s.text)
        if len(wrong) > 0 and wrong[0].text in st_sen:
                continue

        if "。" in st_sen :
                sep = "。"
                st_sen = st_sen.split(sep,1)[0].strip()
                st_sen += " " + sep
        if "?" in st_sen:
                sep = "?"
                st_sen = st_sen.split(sep,1)[0].strip()
                st_sen  += " " + sep

        all_set = st_sen +"\t"+ jiegou_str +"\t"+ usedfor.text +"\t"+ heading.text + "\t" + url

        print(all_set)
        save_file.write(all_set)
print("\n\n")
save_file.write("\n\n")

1 Answer 1

2

To Get the structure and example in a sequence. Induce WebDriverWait() and wait for visibility_of_all_elements_located() and following xpath option.

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium import webdriver

url = "https://resources.allsetlearning.com/chinese/grammar/Reduplication_of_adjectives"
driver = webdriver.Chrome(r"C:\Users\<user>\Documents\chromedriver\chromedriver.exe")
driver.get(url)
structureelements=WebDriverWait(driver,15).until(EC.visibility_of_all_elements_located((By.XPATH,"//h3[./span[text()='Structure']]/following::div[1]")))

for structure in structureelements:
    print("============================")
    print(structure.text)
    print("========================================")

    for example in structure.find_elements_by_xpath(".//following::h3[1]/following::div[1]//li[@class='spaced']"):
        print(example.text)

Output:

============================
Adj. + Adj. + 的 (+ Noun)
========================================
你 的 脸 红 红 的 。
Nǐ de liǎn hóng hóng de.
Your face is red.
宝宝 的 眼睛 大 大 的 。
Bǎobao de yǎnjīng dà dà de.
The baby's eyes are big.
今晚 的 月亮 圆 圆 的 。
Jīnwǎn de yuèliàng yuán yuán de.
The moon is round tonight.
她 爸爸 高 高 胖 胖 的 。
Tā bàba gāo gāo pàng pàng de.
Her father is tall and fat.
我 妹妹 瘦 瘦 小 小 的 。
Wǒ mèimei shòu shòu xiǎo xiǎo de.
My little sister is thin and small.
============================
A A B B + 的 (+ Noun)
========================================
高兴 → 高高兴兴
gāoxìng → gāogāo-xìngxìng
happy
热闹 → 热热闹闹
rènao → rèrè-nāonāo
noisy, boisterous
漂亮 → 漂漂亮亮
piàoliang → piàopiào-liāngliāng
pretty
舒服 → 舒舒服服
shūfu → shūshū-fūfū
comfortable
安静 → 安安静静
ānjìng → ānān-jìngjìng
quiet and still
============================
AABB + 地 + Verb
========================================
我们 清清楚楚 地 看到 他 跟 一 个 胖 胖 的 男人 上 车 了 。
Wǒmen qīngqīng-chǔchǔ de kàndào tā gēn yīgè pàng pàng de nánrén shàngchē le.
We clearly saw him get in the car with a fat man.
我 真 想 舒舒服服 地 躺 在 沙发 上 看 电视 。
Wǒ zhēn xiǎng shūshū-fūfū de tǎng zài shāfā shàng kàn diànshì.
I'd really like to comfortably lie on the couch and watch TV.
你 妈妈 辛辛苦苦 地 做 了 两 个 小时 的 饭,你 怎么 不 吃 ?
Nǐ māma xīnxīn-kǔkǔ de zuò le liǎng gè xiǎoshí de fàn, nǐ zěnme bù chī?
Your mother labored over this meal for two hours, and you aren't going to eat it?
============================
Subj. + ABAB
========================================
妹妹 快 过 生日 了 ,我 打算 给 她 办 一 个 生日 派对 ,热闹 热闹 。
Mèimei kuài guò shēngrì le, wǒ dǎsuàn gěi tā bàn yī gè shēngrì pàiduì, rènao rènao.
My little sister's birthday is coming and I plan to throw her a birthday party and have a blast.
来 ,喝 点 酒 ,高兴 高兴 。
Lái, hē diǎn jiǔ, gāoxìng gāoxìng.
Come on, have a little wine and enjoy yourself.
到 这里 来 凉快 凉快 。
Dào zhèlǐ lái liángkuai liángkuai.
Come over here and cool off.
我 想 去 外面 走走 ,安静 一下 。
Wǒ xiǎng qù wàimiàn zǒuzou, ānjìng yīxià.
I'd like to take a walk outside, get some quiet time.
想 不 想 去 做 个 按摩 ,放松 一下 。
Xiǎng bu xiǎng qù zuò gè ànmó, fàngsōng yīxià.
Would you like to go get a massage and unwind?
Sign up to request clarification or add additional context in comments.

2 Comments

Aren't 2 divs missing with //h3[./span[text()='Structure']]/following::div[1]? 4 structures instead of 6, since first and last grammar points have 2 structures. Maybe a one liner XPath would be enough for this problem : //div[@class='jiegou']/p[1]|//li[@class="spaced"].
Yeah, I tried it. If two structures present then one structure get missed. What will be the possible solutions to this?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.