import requests
data = {\'name\': \'germey\', \'age\': \'22\'}
headers = {
\'User-Agent\': \'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36\'
}
response = requests.post(\"http://httpbin.org/post\", data=data, headers=headers)
print(response.json())
\'\'\'结果如下:
{\'args\': {}, \'data\': \'\', \'files\': {}, \'form\': {\'age\': \'22\', \'name\': \'germey\'}, \'headers\': {\'Accept\': \'*/*\', \'Accept-Encoding\': \'gzip, deflate\', \'Connection\': \'close\', \'Content-Length\': \'18\', \'Content-Type\': \'application/x-www-form-urlencoded\', \'Host\': \'httpbin.org\', \'User-Agent\': \'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36\'}, \'json\': None, \'origin\': \'114.221.2.90\', \'url\': \'http://httpbin.org/post\'}
\'\'\'
Requests是python实现的简单易用的HTTP库,使用起来比urllib简洁很多,Requests库是用pythony语言开发,基于urllib,采用Apache2 Licensed 开源协议的 第三方HTTP库。
Requests的官网文档:http://docs.python-requests.org/zh_CN/latest/user/quickstart.html
1.requests发送get请求与常见属性
1.1.requests的发送无参get请求
1.request发送基本get请求
import requests
response = requests.get(\'http://httpbin.org/get\')
print(response.text) #使用response.text显示response内容
\'\'\'结果如下:
{
\"args\": {},
\"headers\": {
\"Accept\": \"*/*\",
\"Accept-Encoding\": \"gzip, deflate\",
\"Connection\": \"close\",
\"Host\": \"httpbin.org\",
\"User-Agent\": \"python-requests/2.20.1\"
},
\"origin\": \"114.221.2.90\",
\"url\": \"http://httpbin.org/get\"
}
\'\'\'
1.2.requests发送有参get请求
1.requests发送带参数的get请求
方式1:
import requests
response = requests.get(\"http://httpbin.org/get?name=germey&age=22\")
print(response.text)
方式2:
import requests
data = {
\'name\': \'germey\',
\'age\': 22
}
response = requests.get(\"http://httpbin.org/get\", params=data)
print(response.text)
1.3requests解析JSON
1.requests解析JSON
import requests
import json
response = requests.get(\"http://httpbin.org/get\")
print(type(response.text))
print(\'-------------------------------\')
print(response.json()) #获取的response转换成JSON
print(\'-------------------------------\')
print(json.loads(response.text)) #使用JSON类中的方法将response转换成JSON,和上面结果一样
print(\'-------------------------------\')
print(type(response.json()))
\'\'\'结果如下:
<class \'str\'>
-------------------------------
{\'args\': {}, \'headers\': {\'Accept\': \'*/*\', \'Accept-Encoding\': \'gzip, deflate\', \'Connection\': \'close\', \'Host\': \'httpbin.org\', \'User-Agent\': \'python-requests/2.20.1\'}, \'origin\': \'114.221.2.90\', \'url\': \'http://httpbin.org/get\'}
-------------------------------
{\'args\': {}, \'headers\': {\'Accept\': \'*/*\', \'Accept-Encoding\': \'gzip, deflate\', \'Connection\': \'close\', \'Host\': \'httpbin.org\', \'User-Agent\': \'python-requests/2.20.1\'}, \'origin\': \'114.221.2.90\', \'url\': \'http://httpbin.org/get\'}
-------------------------------
<class \'dict\'>
\'\'\'
1.4通过get请求获取网页文本或二进制数据
1.通过get请求获取网页二进制格式和文本格式数据
import requests
response = requests.get(\"https://github.com/favicon.ico\")
print(type(response.text), type(response.content)) #<class \'str\'> <class \'bytes\'>
print(response.text) #是字符串类型
print(response.content) #content是网页二进制数据
2.通过file类方法将get请求获取的网页数据存储到本地
import requests
response = requests.get(\"https://github.com/favicon.ico\")
with open(\'./favicon.ico\', \'wb\') as f: #将get请求返回的内容保存到当前目录
f.write(response.content)
f.close()
1.5发送get请求添加headers参数
一般爬虫都要添加headers参数,不然很多往网站直接就会 返回not found,核心就是user_agent。比如下面爬去知乎界面,如果不添加headers直接返回失败
import requests
response = requests.get(\"https://www.zhihu.com/explore\")
print(response.text)
\'\'\'
<html>
<head><title>400 Bad Request</title></head>
<body bgcolor=\"white\">
<center><h1>400 Bad Request</h1></center>
<hr><center>openresty</center>
</body>
</html>
\'\'\'
使用requests发送get请求添加headers参数:可以正常访问知乎.
import requests
headers = {
\'user-agent\':\'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36\'
}
response = requests.get(\"https://www.zhihu.com/explore\", headers=headers)
print(response.text)
1.5使用get请求返回的一些属性提取
import requests
response = requests.get(\'https://www.baidu.com/\')
print(type(response)) #返回类型
print(\'--------------------------------------\')
print(response.status_code) #get请求返回值
print(\'--------------------------------------\')
print(type(response.text))
print(\'--------------------------------------\')
print(response.text)
print(\'--------------------------------------\')
print(response.cookies)
\'\'\'结果如下:
<class \'requests.models.Response\'>
--------------------------------------
200
--------------------------------------
<class \'str\'>
--------------------------------------
<!DOCTYPE html>
<!--STATUS OK--><html> <head><meta http-equiv=content-type content=text/html;charset=utf-8><meta http-equiv=X-UA-Compatible content=IE=Edge><meta content=always name=referrer><link rel=stylesheet type=text/css href=https://ss1.bdstatic.com/5eN1bjq8AAUYm2zgoY3K/r/www/cache/bdorz/baidu.min.css><title>ç¾åº¦ä¸ä¸ï¼ä½ å°±ç¥é</title></head> <body link=#0000cc> <div id=wrapper> <div id=head> <div class=head_wrapper> <div class=s_form> <div class=s_form_wrapper> <div id=lg> <img hidefocus=true src=//www.baidu.com/img/bd_logo1.png width=270 height=129> </div> <form id=form name=f action=//www.baidu.com/s class=fm> <input type=hidden name=bdorz_come value=1> <input type=hidden name=ie value=utf-8> <input type=hidden name=f value=8> <input type=hidden name=rsv_bp value=1> <input type=hidden name=rsv_idx value=1> <input type=hidden name=tn value=baidu><span class=\"bg s_ipt_wr\"><input id=kw name=wd class=s_ipt value maxlength=255 autocomplete=off autofocus=autofocus></span><span class=\"bg s_btn_wr\"><input type=submit id=su value=ç¾åº¦ä¸ä¸ class=\"bg s_btn\" autofocus></span> </form> </div> </div> <div id=u1> <a href=http://news.baidu.com name=tj_trnews class=mnav>æ°é»</a> <a href=https://www.hao123.com name=tj_trhao123 class=mnav>hao123</a> <a href=http://map.baidu.com name=tj_trmap class=mnav>å°å¾</a> <a href=http://v.baidu.com name=tj_trvideo class=mnav>è§é¢</a> <a href=http://tieba.baidu.com name=tj_trtieba class=mnav>è´´å§</a> <noscript> <a href=http://www.baidu.com/bdorz/login.gif?login&tpl=mn&u=http%3A%2F%2Fwww.baidu.com%2f%3fbdorz_come%3d1 name=tj_login class=lb>ç»å½</a> </noscript> <script>document.write(\'<a href=\"http://www.baidu.com/bdorz/login.gif?login&tpl=mn&u=\'+ encodeURIComponent(window.location.href+ (window.location.search === \"\" ? \"?\" : \"&\")+ \"bdorz_come=1\")+ \'\" name=\"tj_login\" class=\"lb\">ç»å½</a>\');
</script> <a href=//www.baidu.com/more/ name=tj_briicon class=bri style=\"display: block;\">æ´å¤äº§å</a> </div> </div> </div> <div id=ftCon> <div id=ftConw> <p id=lh> <a href=http://home.baidu.com>å³äºç¾åº¦</a> <a href=http://ir.baidu.com>About Baidu</a> </p> <p id=cp>©2017 Baidu <a href=http://www.baidu.com/duty/>使ç¨ç¾åº¦åå¿è¯»</a> <a href=http://jianyi.baidu.com/ class=cp-feedback>æè§åé¦</a> 京ICPè¯030173å· <img src=//www.baidu.com/img/gs.gif> </p> </div> </div> </div> </body> </html>
--------------------------------------
<RequestsCookieJar[<Cookie BDORZ=27315 for .baidu.com/>]>
Process finished with exit code 0
\'\'\'
2.requests发送post请求,跟get差不多
2.1.发送post请求,参数以字典的形式即可
import requests
data = {\'name\': \'germey\', \'age\': \'22\'}
response = requests.post(\"http://httpbin.org/post\", data=data)
print(response.text)
\'\'\'结果请求:
{
\"args\": {},
\"data\": \"\",
\"files\": {},
\"form\": {
\"age\": \"22\",
\"name\": \"germey\"
},
\"headers\": {
\"Accept\": \"*/*\",
\"Accept-Encoding\": \"gzip, deflate\",
\"Connection\": \"close\",
\"Content-Length\": \"18\",
\"Content-Type\": \"application/x-www-form-urlencoded\",
\"Host\": \"httpbin.org\",
\"User-Agent\": \"python-requests/2.19.1\"
},
\"json\": null,
\"origin\": \"114.221.2.90\",
\"url\": \"http://httpbin.org/post\"
}
\'\'\'
2.2发送带headers的post请求
import requests
data = {\'name\': \'germey\', \'age\': \'22\'}
headers = {
\'User-Agent\': \'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36\'
}
response = requests.post(\"http://httpbin.org/post\", data=data, headers=headers)
print(response.json())
\'\'\'结果如下:
{\'args\': {}, \'data\': \'\', \'files\': {}, \'form\': {\'age\': \'22\', \'name\': \'germey\'}, \'headers\': {\'Accept\': \'*/*\', \'Accept-Encoding\': \'gzip, deflate\', \'Connection\': \'close\', \'Content-Length\': \'18\', \'Content-Type\': \'application/x-www-form-urlencoded\', \'Host\': \'httpbin.org\', \'User-Agent\': \'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36\'}, \'json\': None, \'origin\': \'114.221.2.90\', \'url\': \'http://httpbin.org/post\'}
\'\'\'
2.3关于response常见的属性
import requests
response = requests.get(\'http://www.jianshu.com\')
print(type(response.status_code), response.status_code)
print(type(response.headers), response.headers)
print(type(response.cookies), response.cookies)
print(type(response.url), response.url)
print(type(response.history), response.history)
3.关于requests库的常见其他用法
3.1文件上传功能
1.将当前目录下的下的favicon.ico文件上传到远程服务器上
import requests
files = {\'file\': open(\'./favicon.ico\', \'rb\')}
response = requests.post(\"http://httpbin.org/post\", files=files)
print(response.text)
\'\'\'结果返回:
{
\"args\": {},
\"data\": \"\",
\"files\": {
\"file\": \"data:application/octet-stream;base64,AAABAAIAEBAAAAEAIAAoBQAAJgAAACAgAAABACAAKBQAAE4FAAAoAAAAEAAAACAAAAABACAAAAA
.......内容省略...................
},
\"form\": {},
\"headers\": {
\"Accept\": \"*/*\",
\"Accept-Encoding\": \"gzip, deflate\",
\"Connection\": \"close\",
\"Content-Length\": \"6665\",
\"Content-Type\": \"multipart/form-data; boundary=64baa48fe6e9aa9985fd4758bc97f1e9\",
\"Host\": \"httpbin.org\",
\"User-Agent\": \"python-requests/2.20.1\"
},
\"json\": null,
\"origin\": \"114.221.2.90\",
\"url\": \"http://httpbin.org/post\"
}
\'\'\'
3.2获取网站cookie值
import requests
response = requests.get(\"https://www.baidu.com\")
print(response.cookies)
for key, value in response.cookies.items():
print(key + \'=\' + value)
\'\'\'结果如下:
<RequestsCookieJar[<Cookie BDORZ=27315 for .baidu.com/>]>
BDORZ=27315
\'\'\'
3.3模拟登陆,通过Session()
import requests
s = requests.Session()
s.get(\'http://httpbin.org/cookies/set/number/123456789\')
response = s.get(\'http://httpbin.org/cookies\')
print(response.text)
3.4证书验证
1.登陆有些网站时,如果没有下载过网站验证证书,直接访问会被报错,如下所示:请求12306网站会被报错,SSLError
import requests
response = requests.get(\'https://www.12306.cn\')
print(response.status_code)
2.这个时候可以在发送get请求时,使用verify=False进行不验证,在可以通过urllib3.disable_warnings()忽略报错
import requests
from requests.packages import urllib3
urllib3.disable_warnings()
response = requests.get(\'https://www.12306.cn\', verify=False)##证书验证设为FALSE
print(response.status_code)
3.可以在发送get请求时,添加本地证书进行验证,如下所示:
import requests
response = requests.get(\'https://www.12306.cn\', cert=(\'./server.crt\', \'./key\'))
print(response.status_code)
3.5requests关于代理的设置
1.进行服务代理设置
import requests
proxies = {
\"http\": \"http://127.0.0.1:9743\",
\"https\": \"https://127.0.0.1:9743\",
}
response = requests.get(\"https://www.taobao.com\", proxies=proxies)
print(response.status_code)
2.代理设置方式2
import requests
proxies = {
\"http\": \"http://user:password@127.0.0.1:9743/\",
}
response = requests.get(\"https://www.taobao.com\", proxies=proxies)
print(response.status_code)
3.6超时设置,以及异常处理
import requests
from requests.exceptions import ReadTimeout
try:
response = requests.get(\"http://httpbin.org/get\", timeout = 0.8)
print(response.status_code)
print(\'-----------------异常分界线-------------------\')
except ReadTimeout :
print(\'哈哈哈哈,Timeout\')
\'\'\'测试结果1:
200
-----------------异常分界线-------------------
\'\'\'
\'\'\'测试结果2:
哈哈哈哈,Timeout
\'\'\'
3.7认证设置
访问某些网站时,首先是登录界面,需要输入用户名和密码,必须登录以后才能进行操作,这个时候可以使用auth进行授权账号密码进行登录。如下所示:
import requests
from requests.auth import HTTPBasicAuth
r = requests.get(\'http://120.27.34.24:9001\', auth=HTTPBasicAuth(\'user\', \'123\'))
print(r.status_code)
3.8常见的请求异常类型
在你不确定会发生什么错误时,尽量使用try...except来捕获异常所有的requests exception:
import requests
from requests.exceptions import ReadTimeout, ConnectionError, RequestException
try:
response = requests.get(\"http://httpbin.org/get\", timeout = 0.5)
print(response.status_code)
except ReadTimeout:
print(\'Timeout\')
except ConnectionError:
print(\'Connection error\')
except RequestException:
print(\'Error\')
版权声明
本文仅代表作者观点,不代表百度立场。
本文系作者授权百度百家发表,未经许可,不得转载。



