[使用案例]通过使用ip代理腾讯视频评论爬虫案例

发布时间：2020-05-24 关注热度：°C

　　对于专业的爬虫工程师来说，掌握好爬虫语言以及ip代理的配合使用，能够爬取到不同网站的不同信息。今天我们来看看一个具体的爬虫案例。

　　如何爬取腾讯视频的评论?下面跟着IP海来看看具体的操作步骤：

　　在火狐浏览器打开腾讯视频，比如https://v.qq.com/x/cover/j6cgzhtkuonf6te.html

　　点击查看更多解读，这时fiddler会有一个js文件：

腾讯视频评论爬虫案例

　　里面的内容就是评论。

　　找到一条评论转一下码：

腾讯视频评论爬虫案例

　　在火狐里ctrl+f看看有没有这条评论。

　　copy js文件的url。

　　点击查看更多评论，再触发一个json，copy url

　　分析两个url：

腾讯视频评论爬虫案例

　　简化一下网页试试：https://video.coral.qq.com/filmreviewr/c/upcomment/j6cgzhtkuonf6te?reqnum=3&commentid=6227734628246412645

　　通过分析，我们可以知道j6cg……是视频id，reqnum是每次查看的评论数量，commentid是评论id

　　https://video.coral.qq.com/filmreviewr/c/upcomment/【vid】?reqnum=【num】&commentid=【cid】

　　单页评论爬虫

　　有一些特殊字符比如图片现在还不知道怎么处理……以后再说吧

　　import urllib.request

　　import re

　　from uaip import *

　　vid="j6cgzhtkuonf6te"

　　cid="6227734628246412645"

　　num="3" #每页提取3个

　　url="https://video.coral.qq.com/filmreviewr/c/upcomment/"+vid+"?reqnum="+num+"&commentid="+cid

　　data=ua_ip(url)

　　titlepat='"title":"(.*?)","abstract":"'

　　commentpat='"content":"(.*?)",'

　　titleall=re.compile(titlepat,re.S).findall(data)

　　commentall=re.compile(commentpat,re.S).findall(data)

　　# print(len(commentall))

　　for i in range(len(titleall)):

　　try:

　　print("评论标题是："+eval("u'"+titleall[i]+"'"))

　　print("评论内容是："+eval("u'"+commentall[i]+"'"))

　　print('---------------')

　　except Exception as err:

　　print(err)

　　翻页评论爬虫

　　查看网页源代码可以发现last:后面的内容为下一页的id

　　import urllib.request

　　import re

　　from uaip import *

　　vid="j6cgzhtkuonf6te"

　　cid="6227734628246412645"

　　num="3"

　　for j in range(10): #爬取1~10页内容

　　print("第"+str(j+1)+"页")

　　url = "https://video.coral.qq.com/filmreviewr/c/upcomment/" + vid + "?reqnum=" + num + "&commentid=" + cid

　　data = ua_ip(url)

　　titlepat = '"title":"(.*?)","abstract":"'

　　commentpat = '"content":"(.*?)",'

　　titleall = re.compile(titlepat, re.S).findall(data)

　　commentall = re.compile(commentpat, re.S).findall(data)

　　lastpat='"last":"(.*?)"'

　　cid=re.compile(lastpat,re.S).findall(data)[0]

　　for i in range(len(titleall)):

　　try:

　　print("评论标题是：" + eval("u'" + titleall[i] + "'"))

　　print("评论内容是：" + eval("u'" + commentall[i] + "'"))

　　print('---------------')

　　except Exception as err:

　　print(err)

　　对于短评(普通评论)方法类似，这里就不赘述了，看下面这个短评爬虫代码：

　　将https://video.coral.qq.com/varticle/1743283224/comment/v2?callback=_varticle1743283224commentv2&orinum=10&oriorder=o&pageflag=1&cursor=6442954225602101929&scorecursor=0&orirepnum=2&reporder=o&reppageflag=1&source=132&_=1566363507957

　　简化成：https://video.coral.qq.com/varticle/1743283224/comment/v2?orinum=10&oriorder=o&pageflag=1&cursor=6442954225602101929

　　import urllib.request

　　import re

　　from uaip import *

　　vid="1743283224"

　　cid="6442954225602101929"

　　num="5"

　　for j in range(10): #爬取1~10页内容

　　print("第"+str(j+1)+"页")

　　url="https://video.coral.qq.com/varticle/"+vid+"/comment/v2?orinum="+num+"&oriorder=o&pageflag=1&cursor="+cid

　　data = ua_ip(url)

　　commentpat = '"content":"(.*?)"'

　　commentall = re.compile(commentpat, re.S).findall(data)

　　lastpat='"last":"(.*?)"'

　　cid=re.compile(lastpat,re.S).findall(data)[0]

　　# print(len(gg))

　　# print(len(commentall))

　　for i in range(len(commentall)):

　　try:

　　print("评论内容是：" + eval("u'" + commentall[i] + "'"))

　　print('---------------')

　　except Exception as err:

　　print(err)

　　以上，我们就将腾讯视频的评论内容给抓取下来了，大家可以自己练习看看效果。

上一篇：[代理百科]用ip代理的那些误解

下一篇：[使用介绍]GO语言如何测试ip代理？

登录 注册

[使用案例]通过使用ip代理腾讯视频评论爬虫案例

最新资讯

干货分享