python

超轻量级php框架startmvc

Python爬虫实现获取动态gif格式搞笑图片的方法示例

更新时间:2020-06-16 21:48:01 作者:startmvc
本文实例讲述了Python爬虫实现获取动态gif格式搞笑图片的方法。分享给大家供大家参考,具

本文实例讲述了Python爬虫实现获取动态gif格式搞笑图片的方法。分享给大家供大家参考,具体如下:

有时候看到一些喜欢的动图,如果一个个取保存挺麻烦,有的网站还不支持右键保存,因此使用python来获取动态图,就看看就很有意思了

本次爬取的网站是  居然搞笑网 http://www.zbjuran.com/dongtai/list_4_1.html

思路:

获取当前页面内容

查找页面中动图所代表的url地址

保存这个地址内容到本地

如果想爬取多页,就可以加上一个循环条件

代码:


#!/usr/bin/python
#coding:utf-8
import urllib2,time,uuid,urllib,os,sys,re
from bs4 import BeautifulSoup
reload(sys)
sys.setdefaultencoding('utf-8')
#获取页面内容
def getHtml(url):
 try:
 print url
 html = urllib2.urlopen(url).read()#.decode('utf-8')#解码为utf-8
 except:
 return
 return html
#获取动图所代表的url列表
def getImagUrl(html):
 if not html:
 print 'nothing can be found'
 return
 ImagUrlList=[]
 soup=BeautifulSoup(html,'lxml')
 #获取item列表
 items=soup.find("div",{"class":"main"}).find_all('div',{'class':'item'})
 for item in items:
 target={}
 #通过if语句,过滤广告项
 if item.find('div',{"class":"text"}):
 #获取url
 imgurl=item.find('div',{"class":"text"}).find('img').get('src')
 target['url']=imgurl
 #获取名字
 target['name']=item.find('h3').text
 ImagUrlList.append(target)
 return ImagUrlList
#下载图片到本地
def download(author,imgurl,typename,pageNo):
 #定义文件夹的名字
 x = time.localtime(time.time())
 foldername = str(x.__getattribute__("tm_year"))+"-"+str(x.__getattribute__("tm_mon"))+"-"+str(x.__getattribute__("tm_mday"))
 download_img=None
 picpath = 'Jimy/%s/%s/%s' % (foldername,typename,str(pageNo))
 filename = author+str(uuid.uuid1())
 pic_type=imgurl[-3:]
 if not os.path.exists(picpath):
 os.makedirs(picpath)
 target = picpath+"/%s.%s" % (filename,pic_type)
 print "动图存贮位置:"+target
 download_img = urllib.urlretrieve(imgurl, target)#将图片下载到指定路径中
 print "图片出处为:"+imgurl
 return download_img
#退出函数
def myquit():
 print "Bye Bye!"
 exit(0)
def start(pageNo):
 targeturl="http://www.zbjuran.com/dongtai/list_4_%s.html" % str(pageNo)
 html = getHtml(targeturl)
 urllist=getImagUrl(html)
 for imgurl in urllist:
 download(imgurl['name'],imgurl['url'],'搞笑动图',pageNo)
if __name__ == '__main__':
 print '''
 *****************************************
 ** Welcome to Spider of GIF **
 ** Created on 2017-3-16 **
 ** @author: Jimy **
 *****************************************'''
 pageNo = raw_input("Input the page number you want to scratch (1-50),please input 'quit' if you want to quit\n\
请输入要爬取的页面,范围为(1-100),如果退出,请输入Q>\n>")
 while not pageNo.isdigit() or int(pageNo) > 50 or int(pageNo) < 1:
 if pageNo == 'Q':
 myquit()
 print "Param is invalid , please try again."
 pageNo = raw_input("Input the page number you want to scratch >")
 print pageNo
 start(pageNo)
 #第一次爬取结束
 pageNo = raw_input("Input the page number you want to scratch (1-50),please input 'quit' if you want to quit\n\
请输入总共需要爬取的页面,范围为(1-5000),如果退出,请输入Q>\n>")
 while not pageNo.isdigit() or int(pageNo) > 5000 or int(pageNo) < 1:
 if pageNo == 'Q':
 myquit()
 print "Param is invalid , please try again."
 pageNo = raw_input("Input the page number you want to scratch >")
 #循环遍历,爬取多页
 for num in xrange(int(pageNo)):
 start(str(num+1))

结果如下:

                        *****************************************                         **    Welcome to Spider of GIF         **                         **      Created on 2017-3-16           **                         **      @author: Jimy                  **                         ***************************************** Input the page number you want to scratch (1-50),please input 'quit' if you want to quit 请输入要爬取的页面,范围为(1-100),如果退出,请输入Q> >1 1 http://www.zbjuran.com/dongtai/list_4_1.html 动图存贮位置:Jimy/2017-3-16/搞笑动图/1/真是艰难的选择。3f0fe8f6-09f8-11e7-9161-f8bc12753d1e.gif 图片出处为:http://www.zbjuran.com/uploads/allimg/170206/10-1F206135ZHJ.gif 动图存贮位置:Jimy/2017-3-16/搞笑动图/1/这么贱会被打死吧……3fa9da88-09f8-11e7-9161-f8bc12753d1e.gif 图片出处为:http://www.zbjuran.com/uploads/allimg/170206/10-1F206135H35U.gif 动图存贮位置:Jimy/2017-3-16/搞笑动图/1/一看就是印度……4064e60c-09f8-11e7-9161-f8bc12753d1e.gif 图片出处为:http://www.zbjuran.com/uploads/allimg/170206/10-1F20613543c50.gif 动图存贮位置:Jimy/2017-3-16/搞笑动图/1/新垣结衣的正经工作脸414b4f52-09f8-11e7-9161-f8bc12753d1e.gif 图片出处为:http://www.zbjuran.com/uploads/allimg/170206/10-1F206135250553.gif 动图存贮位置:Jimy/2017-3-16/搞笑动图/1/妹子这是在摇什么的421afa86-09f8-11e7-9161-f8bc12753d1e.gif 图片出处为:http://www.zbjuran.com/uploads/allimg/170206/10-1F20613493N03.gif Input the page number you want to scratch (1-50),please input 'quit' if you want to quit 请输入总共需要爬取的页面,范围为(1-5000),如果退出,请输入Q> >Q Bye Bye!

最终就能够获得动态图了

Python 爬虫 获取 动态 gif格式 搞笑图片