python爬虫requests的库使用详解

小编 2026-07-02 阅读:1738 评论:0

import requests data = {\'name\': \'germey\', \'age\': \'22\'} headers = { \'User-Agent\': \'Mozi...

import requests

data = {\'name\': \'germey\', \'age\': \'22\'}
headers = {
    \'User-Agent\': \'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36\'
}
response = requests.post(\"http://httpbin.org/post\", data=data, headers=headers)
print(response.json())
\'\'\'结果如下：
{\'args\': {}, \'data\': \'\', \'files\': {}, \'form\': {\'age\': \'22\', \'name\': \'germey\'}, \'headers\': {\'Accept\': \'*/*\', \'Accept-Encoding\': \'gzip, deflate\', \'Connection\': \'close\', \'Content-Length\': \'18\', \'Content-Type\': \'application/x-www-form-urlencoded\', \'Host\': \'httpbin.org\', \'User-Agent\': \'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36\'}, \'json\': None, \'origin\': \'114.221.2.90\', \'url\': \'http://httpbin.org/post\'}
\'\'\'

Requests是python实现的简单易用的HTTP库，使用起来比urllib简洁很多,Requests库是用pythony语言开发，基于urllib，采用Apache2 Licensed 开源协议的第三方HTTP库。

Requests的官网文档：http://docs.python-requests.org/zh_CN/latest/user/quickstart.html

1.requests发送get请求与常见属性

1.1.requests的发送无参get请求

1.request发送基本get请求
import requests

response = requests.get(\'http://httpbin.org/get\')
print(response.text) #使用response.text显示response内容

\'\'\'结果如下：
{
  \"args\": {}, 
  \"headers\": {
    \"Accept\": \"*/*\", 
    \"Accept-Encoding\": \"gzip, deflate\", 
    \"Connection\": \"close\", 
    \"Host\": \"httpbin.org\", 
    \"User-Agent\": \"python-requests/2.20.1\"
  }, 
  \"origin\": \"114.221.2.90\", 
  \"url\": \"http://httpbin.org/get\"
}
\'\'\'

1.2.requests发送有参get请求

1.requests发送带参数的get请求
方式1：
import requests
response = requests.get(\"http://httpbin.org/get?name=germey&age=22\")
print(response.text)
方式2：
import requests

data = {
    \'name\': \'germey\',
    \'age\': 22
}
response = requests.get(\"http://httpbin.org/get\", params=data)
print(response.text)

1.3requests解析JSON

1.requests解析JSON
import requests
import json

response = requests.get(\"http://httpbin.org/get\")
print(type(response.text))
print(\'-------------------------------\')
print(response.json()) #获取的response转换成JSON
print(\'-------------------------------\')
print(json.loads(response.text)) #使用JSON类中的方法将response转换成JSON，和上面结果一样
print(\'-------------------------------\')
print(type(response.json()))
\'\'\'结果如下：
<class \'str\'>
-------------------------------
{\'args\': {}, \'headers\': {\'Accept\': \'*/*\', \'Accept-Encoding\': \'gzip, deflate\', \'Connection\': \'close\', \'Host\': \'httpbin.org\', \'User-Agent\': \'python-requests/2.20.1\'}, \'origin\': \'114.221.2.90\', \'url\': \'http://httpbin.org/get\'}
-------------------------------
{\'args\': {}, \'headers\': {\'Accept\': \'*/*\', \'Accept-Encoding\': \'gzip, deflate\', \'Connection\': \'close\', \'Host\': \'httpbin.org\', \'User-Agent\': \'python-requests/2.20.1\'}, \'origin\': \'114.221.2.90\', \'url\': \'http://httpbin.org/get\'}
-------------------------------
<class \'dict\'>
\'\'\'

1.4通过get请求获取网页文本或二进制数据

1.通过get请求获取网页二进制格式和文本格式数据
import requests

response = requests.get(\"https://github.com/favicon.ico\")
print(type(response.text), type(response.content)) #<class \'str\'> <class \'bytes\'>
print(response.text)  #是字符串类型
print(response.content) #content是网页二进制数据

2.通过file类方法将get请求获取的网页数据存储到本地

import requests

response = requests.get(\"https://github.com/favicon.ico\")
with open(\'./favicon.ico\', \'wb\') as f:  #将get请求返回的内容保存到当前目录
    f.write(response.content)
    f.close()

1.5发送get请求添加headers参数

一般爬虫都要添加headers参数，不然很多往网站直接就会返回not found，核心就是user_agent。比如下面爬去知乎界面，如果不添加headers直接返回失败

import requests

response = requests.get(\"https://www.zhihu.com/explore\")
print(response.text)
\'\'\'
<html>
<head><title>400 Bad Request</title></head>
<body bgcolor=\"white\">
<center><h1>400 Bad Request</h1></center>
<hr><center>openresty</center>
</body>
</html>

\'\'\'

使用requests发送get请求添加headers参数：可以正常访问知乎.

import requests

headers = {
\'user-agent\':\'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36\'
}
response = requests.get(\"https://www.zhihu.com/explore\", headers=headers)
print(response.text)

1.5使用get请求返回的一些属性提取

import requests

response = requests.get(\'https://www.baidu.com/\')
print(type(response)) #返回类型
print(\'--------------------------------------\')
print(response.status_code) #get请求返回值
print(\'--------------------------------------\')
print(type(response.text)) 
print(\'--------------------------------------\')
print(response.text)
print(\'--------------------------------------\')
print(response.cookies)

\'\'\'结果如下：
<class \'requests.models.Response\'> 
--------------------------------------
200
--------------------------------------
<class \'str\'>
--------------------------------------
<!DOCTYPE html>
<!--STATUS OK--><html> <head><meta http-equiv=content-type content=text/html;charset=utf-8><meta http-equiv=X-UA-Compatible content=IE=Edge><meta content=always name=referrer><link rel=stylesheet type=text/css href=https://ss1.bdstatic.com/5eN1bjq8AAUYm2zgoY3K/r/www/cache/bdorz/baidu.min.css><title>ç¾åº¦ä¸ä¸ï¼ä½ å°±ç¥é</title></head> <body link=#0000cc> <div id=wrapper> <div id=head> <div class=head_wrapper> <div class=s_form> <div class=s_form_wrapper> <div id=lg> <img hidefocus=true src=//www.baidu.com/img/bd_logo1.png width=270 height=129> </div> <form id=form name=f action=//www.baidu.com/s class=fm> <input type=hidden name=bdorz_come value=1> <input type=hidden name=ie value=utf-8> <input type=hidden name=f value=8> <input type=hidden name=rsv_bp value=1> <input type=hidden name=rsv_idx value=1> <input type=hidden name=tn value=baidu><span class=\"bg s_ipt_wr\"><input id=kw name=wd class=s_ipt value maxlength=255 autocomplete=off autofocus=autofocus></span><span class=\"bg s_btn_wr\"><input type=submit id=su value=ç¾åº¦ä¸ä¸ class=\"bg s_btn\" autofocus></span> </form> </div> </div> <div id=u1> <a href=http://news.baidu.com name=tj_trnews class=mnav>æ°é»</a> <a href=https://www.hao123.com name=tj_trhao123 class=mnav>hao123</a> <a href=http://map.baidu.com name=tj_trmap class=mnav>å°å¾</a> <a href=http://v.baidu.com name=tj_trvideo class=mnav>è§é¢</a> <a href=http://tieba.baidu.com name=tj_trtieba class=mnav>è´´å§</a> <noscript> <a href=http://www.baidu.com/bdorz/login.gif?login&amp;tpl=mn&amp;u=http%3A%2F%2Fwww.baidu.com%2f%3fbdorz_come%3d1 name=tj_login class=lb>ç»å½</a> </noscript> <script>document.write(\'<a href=\"http://www.baidu.com/bdorz/login.gif?login&tpl=mn&u=\'+ encodeURIComponent(window.location.href+ (window.location.search === \"\" ? \"?\" : \"&\")+ \"bdorz_come=1\")+ \'\" name=\"tj_login\" class=\"lb\">ç»å½</a>\');
                </script> <a href=//www.baidu.com/more/ name=tj_briicon class=bri style=\"display: block;\">æ´å¤äº§å</a> </div> </div> </div> <div id=ftCon> <div id=ftConw> <p id=lh> <a href=http://home.baidu.com>å³äºç¾åº¦</a> <a href=http://ir.baidu.com>About Baidu</a> </p> <p id=cp>&copy;2017&nbsp;Baidu&nbsp;<a href=http://www.baidu.com/duty/>ä½¿ç¨ç¾åº¦åå¿è¯»</a>&nbsp; <a href=http://jianyi.baidu.com/ class=cp-feedback>æè§åé¦</a>&nbsp;äº¬ICPè¯030173å·&nbsp; <img src=//www.baidu.com/img/gs.gif> </p> </div> </div> </div> </body> </html>

--------------------------------------
<RequestsCookieJar[<Cookie BDORZ=27315 for .baidu.com/>]>

Process finished with exit code 0


\'\'\'

2.requests发送post请求,跟get差不多

2.1.发送post请求，参数以字典的形式即可

import requests

data = {\'name\': \'germey\', \'age\': \'22\'}
response = requests.post(\"http://httpbin.org/post\", data=data)
print(response.text)

\'\'\'结果请求：
{
  \"args\": {}, 
  \"data\": \"\", 
  \"files\": {}, 
  \"form\": {
    \"age\": \"22\", 
    \"name\": \"germey\"
  }, 
  \"headers\": {
    \"Accept\": \"*/*\", 
    \"Accept-Encoding\": \"gzip, deflate\", 
    \"Connection\": \"close\", 
    \"Content-Length\": \"18\", 
    \"Content-Type\": \"application/x-www-form-urlencoded\", 
    \"Host\": \"httpbin.org\", 
    \"User-Agent\": \"python-requests/2.19.1\"
  }, 
  \"json\": null, 
  \"origin\": \"114.221.2.90\", 
  \"url\": \"http://httpbin.org/post\"
}


\'\'\'

2.2发送带headers的post请求

import requests

data = {\'name\': \'germey\', \'age\': \'22\'}
headers = {
    \'User-Agent\': \'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36\'
}
response = requests.post(\"http://httpbin.org/post\", data=data, headers=headers)
print(response.json())

\'\'\'结果如下：
{\'args\': {}, \'data\': \'\', \'files\': {}, \'form\': {\'age\': \'22\', \'name\': \'germey\'}, \'headers\': {\'Accept\': \'*/*\', \'Accept-Encoding\': \'gzip, deflate\', \'Connection\': \'close\', \'Content-Length\': \'18\', \'Content-Type\': \'application/x-www-form-urlencoded\', \'Host\': \'httpbin.org\', \'User-Agent\': \'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36\'}, \'json\': None, \'origin\': \'114.221.2.90\', \'url\': \'http://httpbin.org/post\'}
\'\'\'

2.3关于response常见的属性

import requests

response = requests.get(\'http://www.jianshu.com\')
print(type(response.status_code), response.status_code)
print(type(response.headers), response.headers)
print(type(response.cookies), response.cookies)
print(type(response.url), response.url)
print(type(response.history), response.history)

3.关于requests库的常见其他用法

3.1文件上传功能

1.将当前目录下的下的favicon.ico文件上传到远程服务器上
import requests

files = {\'file\': open(\'./favicon.ico\', \'rb\')}
response = requests.post(\"http://httpbin.org/post\", files=files)
print(response.text)

\'\'\'结果返回：
{
  \"args\": {}, 
  \"data\": \"\", 
  \"files\": {
    \"file\": \"data:application/octet-stream;base64,AAABAAIAEBAAAAEAIAAoBQAAJgAAACAgAAABACAAKBQAAE4FAAAoAAAAEAAAACAAAAABACAAAAA
.......内容省略...................
  }, 
  \"form\": {}, 
  \"headers\": {
    \"Accept\": \"*/*\", 
    \"Accept-Encoding\": \"gzip, deflate\", 
    \"Connection\": \"close\", 
    \"Content-Length\": \"6665\", 
    \"Content-Type\": \"multipart/form-data; boundary=64baa48fe6e9aa9985fd4758bc97f1e9\", 
    \"Host\": \"httpbin.org\", 
    \"User-Agent\": \"python-requests/2.20.1\"
  }, 
  \"json\": null, 
  \"origin\": \"114.221.2.90\", 
  \"url\": \"http://httpbin.org/post\"
}

\'\'\'

3.2获取网站cookie值

import requests

response = requests.get(\"https://www.baidu.com\")
print(response.cookies)
for key, value in response.cookies.items():
    print(key + \'=\' + value)
\'\'\'结果如下：
<RequestsCookieJar[<Cookie BDORZ=27315 for .baidu.com/>]>
BDORZ=27315
\'\'\'

3.3模拟登陆，通过Session(）

import requests

s = requests.Session()
s.get(\'http://httpbin.org/cookies/set/number/123456789\')
response = s.get(\'http://httpbin.org/cookies\')
print(response.text)

3.4证书验证

1.登陆有些网站时，如果没有下载过网站验证证书，直接访问会被报错，如下所示：请求12306网站会被报错，SSLError

import requests

response = requests.get(\'https://www.12306.cn\')
print(response.status_code)

2.这个时候可以在发送get请求时，使用verify=False进行不验证，在可以通过urllib3.disable_warnings()忽略报错

import requests
from requests.packages import urllib3
urllib3.disable_warnings()
response = requests.get(\'https://www.12306.cn\', verify=False)##证书验证设为FALSE
print(response.status_code)

3.可以在发送get请求时，添加本地证书进行验证，如下所示：

import requests

response = requests.get(\'https://www.12306.cn\', cert=(\'./server.crt\', \'./key\'))
print(response.status_code)

3.5requests关于代理的设置

1.进行服务代理设置
import requests

proxies = {
  \"http\": \"http://127.0.0.1:9743\",
  \"https\": \"https://127.0.0.1:9743\",
}

response = requests.get(\"https://www.taobao.com\", proxies=proxies)
print(response.status_code)

2.代理设置方式2
import requests

proxies = {
    \"http\": \"http://user:password@127.0.0.1:9743/\",
}
response = requests.get(\"https://www.taobao.com\", proxies=proxies)
print(response.status_code)

3.6超时设置,以及异常处理

import requests
from requests.exceptions import ReadTimeout
try:
    response = requests.get(\"http://httpbin.org/get\", timeout = 0.8)
    print(response.status_code)
    print(\'-----------------异常分界线-------------------\')
except ReadTimeout :
    print(\'哈哈哈哈,Timeout\')

\'\'\'测试结果1：
200
-----------------异常分界线-------------------
\'\'\'
\'\'\'测试结果2:
哈哈哈哈,Timeout

\'\'\'

3.7认证设置

访问某些网站时，首先是登录界面，需要输入用户名和密码，必须登录以后才能进行操作，这个时候可以使用auth进行授权账号密码进行登录。如下所示：

import requests
from requests.auth import HTTPBasicAuth

r = requests.get(\'http://120.27.34.24:9001\', auth=HTTPBasicAuth(\'user\', \'123\'))
print(r.status_code)

3.8常见的请求异常类型

在你不确定会发生什么错误时，尽量使用try...except来捕获异常所有的requests exception：

import requests
from requests.exceptions import ReadTimeout, ConnectionError, RequestException
try:
    response = requests.get(\"http://httpbin.org/get\", timeout = 0.5)
    print(response.status_code)
except ReadTimeout:
    print(\'Timeout\')
except ConnectionError:
    print(\'Connection error\')
except RequestException:
    print(\'Error\')

版权声明

本文仅代表作者观点，不代表百度立场。
本文系作者授权百度百家发表，未经许可，不得转载。

上一篇：xamarin Android activity生命周期详解 下一篇：discuz管理员登录进入后台管理马上跳转到登录界面

python爬虫requests的库使用详解

1.requests发送get请求与常见属性

1.1.requests的发送无参get请求

1.2.requests发送有参get请求

1.3requests解析JSON

1.4通过get请求获取网页文本或二进制数据

1.5发送get请求添加headers参数

1.5使用get请求返回的一些属性提取

2.requests发送post请求,跟get差不多

2.1.发送post请求，参数以字典的形式即可

2.2发送带headers的post请求

2.3关于response常见的属性

3.关于requests库的常见其他用法

3.1文件上传功能

3.2获取网站cookie值

3.3模拟登陆，通过Session(）

3.4证书验证

3.5requests关于代理的设置

3.6超时设置,以及异常处理

3.7认证设置

3.8常见的请求异常类型

版权声明

热门文章

Sequential Monte Carlo Methods (SMC) 序列蒙特卡洛/粒子滤波/Bootstrap Filtering

机房智能化温湿度解决方式之POE供电以太网温湿度传感器

Hive 系统函数及示例

HTTP状态保持的原理

CSRF的原理和防范措施

最近发表

标签列表

python爬虫requests的库使用详解

1.requests发送get请求与常见属性

1.1.requests的发送无参get请求

1.2.requests发送有参get请求

1.3requests解析JSON

1.4通过get请求获取网页文本或二进制数据

1.5发送get请求添加headers参数

1.5使用get请求返回的一些属性提取

2.requests发送post请求,跟get差不多

2.1.发送post请求，参数以字典的形式即可

2.2发送带headers的post请求

2.3关于response常见的属性

3.关于requests库的常见其他用法

3.1文件上传功能

3.2获取网站cookie值

3.3模拟登陆，通过Session(）

3.4证书验证

3.5requests关于代理的设置

3.6超时设置,以及异常处理

3.7认证设置

3.8常见的请求异常类型

版权声明

相关阅读

Xamarin android如何反编译apk文件

Xamarin android如何调用百度地图入门示例（一）

Xamarin android SwipeRefreshLayout入门实例