# requests 模块

模拟浏览器发请求

# 使用步骤

指定 $URL$
发起请求
获取相应数据

持续化存储

C++

import requests  
  
url = 'https://www.bilibili.com/'  
  
request = requests.get(url=url)  
  
page_text = request.text  
  
print(page_text)  
  
with open('bilibili.html', 'w', encoding='utf-8') as fp:  
    fp.write(page_text)

# 百度翻译的 "破解"

百度翻译输入单词后，页面做局部刷新（ $Ajax$ 请求 $\to$ $XHR$ ）

$post$ 请求，相应数据是一组 $json$ 数据

网站的反爬策略： $UA$ 检测

反反爬策略：

UA

伪装

Python

header = {  
    'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/135.0.0.0 Safari/537.36'  
}  
  
request = requests.get(url=url, headers=header)

爬虫代码

Python

import json  
  
import requests  
  
url = 'https://fanyi.baidu.com/sug'  
# header = {'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/135.0.0.0 Safari/537.36'}  
data = {'kw' : input('请输入单词: ')}  
response = requests.post(url=url, data=data)  
  
page_text = response.json()  
  
with open(data['kw'] + '.json', 'w', encoding='utf-8') as fp:  
    json.dump(page_text, fp=fp, ensure_ascii=False)

# 豆瓣电影详情数据

我们需要的是页面的局部信息，目前，暂时不考虑页面的数据解析（ $Ajax$ 请求）

Python 爬虫

# requests 模块

# 使用步骤

# 百度翻译的 "破解"

# 豆瓣电影详情数据

ABC-401

hhtp和https协议