2. 發(fā)送請(qǐng)求獲取網(wǎng)頁(yè)內(nèi)容
使用requests庫(kù)發(fā)送請(qǐng)求,獲取網(wǎng)頁(yè)內(nèi)容,這里以獲取首頁(yè)小說(shuō)列表為例:
import requests url = 'https://www.example.com' # 替換為目標(biāo)網(wǎng)站的首頁(yè)URL response = requests.get(url) response.encoding = 'utf8' # 根據(jù)網(wǎng)頁(yè)編碼設(shè)置響應(yīng)編碼 html_content = response.text
3. 解析網(wǎng)頁(yè)內(nèi)容提取小說(shuō)信息
使用BeautifulSoup庫(kù)解析網(wǎng)頁(yè)內(nèi)容,提取小說(shuō)信息,提取小說(shuō)標(biāo)題、作者、字?jǐn)?shù)等信息:
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_content, 'html.parser')
title = soup.find('h1', class_='title').text # 提取標(biāo)題
author = soup.find('span', class_='author').text # 提取作者
word_count = soup.find('span', class_='wordcount').text # 提取字?jǐn)?shù)
4. 保存小說(shuō)內(nèi)容
將提取到的小說(shuō)內(nèi)容保存到本地文件,這里以保存為txt格式為例:
with open('novel.txt', 'w', encoding='utf8') as f:
f.write(title + '
')
f.write(author + '
')
f.write(word_count + '
')
f.write(soup.find('div', class_='content').text) # 提取小說(shuō)正文內(nèi)容并保存
5. 下載小說(shuō)圖片
如果小說(shuō)中有圖片,我們可以使用requests庫(kù)下載圖片并保存到本地,下載小說(shuō)封面圖片:
cover_url = soup.find('img', class_='cover')['src'] # 提取封面圖片URL
response = requests.get(cover_url)
with open('novel_cover.jpg', 'wb') as f:
f.write(response.content) # 保存圖片到本地
6. 完整代碼示例
將以上步驟整合到一起,得到完整的爬取會(huì)員小說(shuō)的Python代碼:
import requests
from bs4 import BeautifulSoup
import os
def get_novel_info(url):
response = requests.get(url)
response.encoding = 'utf8'
html_content = response.text
soup = BeautifulSoup(html_content, 'html.parser')
title = soup.find('h1', class_='title').text
author = soup.find('span', class_='author').text
word_count = soup.find('span', class_='wordcount').text
content = soup.find('div', class_='content').text
return title, author, word_count, content, url + '/images/cover.jpg' # 返回小說(shuō)封面圖片URL(假設(shè)圖片位于同一目錄下)
def save_novel(title, author, word_count, content, cover_url):
with open('novel.txt', 'w', encoding='utf8') as f:
f.write(title + '
')
f.write(author + '
')
f.write(word_count + '
')
f.write(content)
response = requests.get(cover_url)
with open('novel_cover.jpg', 'wb') as f:
f.write(response.content)
print('小說(shuō)已保存!')
return True
if __name__ == '__main__':
novel_url = 'https://www.example.com/novel/1' # 替換為目標(biāo)小說(shuō)的URL地址(需要根據(jù)實(shí)際情況修改)
if not os.path.exists('novel'): # 如果不存在novel文件夾,則創(chuàng)建該文件夾用于存放小說(shuō)文件和圖片等資源文件(可選)
os.mkdir('novel')
title, author, word_count, content, cover_url = get_novel_info(novel_url)
save_novel(title, author, word_count, content, cover_url)
以上就是使用Python爬取會(huì)員小說(shuō)的方法,需要注意的是,不同網(wǎng)站的結(jié)構(gòu)可能有所不同,因此在實(shí)際操作時(shí)需要根據(jù)目標(biāo)網(wǎng)站的具體結(jié)構(gòu)進(jìn)行調(diào)整,爬蟲(chóng)可能會(huì)對(duì)網(wǎng)站造成一定的壓力,請(qǐng)合理控制爬取速度,遵守網(wǎng)站的相關(guān)規(guī)定。
分享題目:python如何爬會(huì)員小說(shuō)
鏈接URL:http://www.dlmjj.cn/article/dpigisd.html


咨詢
建站咨詢

