博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
python单线程、多线程、多进程下载图片(io密集型)耗时对比
阅读量:4301 次
发布时间:2019-05-27

本文共 6603 字,大约阅读时间需要 22 分钟。

天天Python GIL,光嘴上说但是实际并没有真正测试对比过。

今天测试了一下Python的多线程、多进程、单线程的下载图片效率。

实测Python多线程在io密集型的情况下还是比单线程快很多的,引用一下另一位博主解释的原因:

io是分为网络io和磁盘io,一般情况下,io有发送数据(output)和返回数据(input)两个过程。比如以浏览器为主体,浏览器发送请求给服务器(output),服务器再将请求结果返回给浏览器(input)。python在io阻塞的情况下,会释放GIL(global interpreter lock)锁,其他线程会在当前线程等待返回值(阻塞)的情况下继续执行发送请求(output),第三个线程又会在第二个线程等待返回值(阻塞)的情况下发送请求(output),即在同一时间片段,会有一个线程在等待数据,也会有一个线程在发数据。这就减少了io传输的时间。

--------------------- 
作者:daijiguo 
来源:CSDN 
原文:https://blog.csdn.net/daijiguo/article/details/78042309 
版权声明:本文为博主原创文章,转载请附上博文链接!

 

 

至于多线程和多进程,在下载多个图片时(比如图片数量多于cpu核数),并且每个图片比较小的情况下,多线程看似更快,

感觉是因为图片数大于cpu核数,所以进程和线程都要切换?虽然进程切换的比较少,但是进程的开销更大,而虽然因为GIL只有一个cpu核心工作,但是线程开销比较小,加上下载的资源也比较小,多线程切换的次数也较少,所以多线程更快。

而再在下载少量图片(比如图片数量小于cpu核数)时,并且图片比较大,多进程可以充分利用cpu,不用切换进程,减少了开销,而多线程要不停地切换任务,再加上图片比较大,线程不停的切换增加开销降低效率,导致速度不如多进程。

感觉我的理解就是这样了。。。

哪天再测一下cpu密集型多线程、多进程对比。。。

 

import requestsimport timefrom threading import Threadimport threadingimport multiprocessingpython_list=[    'https://www.python.org/ftp/python/3.5.7/Python-3.5.7.tgz',    'https://www.python.org/ftp/python/3.7.3/python-3.7.3.exe',    'https://www.python.org/ftp/python/2.7.16/python-2.7.16.amd64.msi']large_url_list=[    #python地址,虽然不大,但国外地址相对较慢    'https://www.python.org/ftp/python/3.7.3/python-3.7.3.exe',    # 'https://raw.githubusercontent.com/mymmsc/books/master/%E7%AE%97%E6%B3%95%E5%AF%BC%E8%AE%BA%E4%B8%AD%E6%96%87%E7%89%88.pdf',    'http://codown.youdao.com/cidian/YoudaoDict_webdict_default.exe',    'https://down.360safe.com/setup.exe',    'https://d1.music.126.net/dmusic/cloudmusicsetup_2.5.2.197409.exe',    'http://pcclient.download.youku.com/youkuclient/youkuclient_setup_7.7.7.4191.exe',    'http://dl2.xmind.cn/xmind-8-update8-windows.exe',    'https://cdn-dl.yinxiang.com/YXWin6/public/Evernote_6.17.20.667.exe',]url_list=[    'https://images7.alphacoders.com/333/333388.jpg',    'https://images2.alphacoders.com/597/597309.jpg',    'https://images8.alphacoders.com/562/562449.jpg',    'https://images.alphacoders.com/562/562450.jpg',    'https://images3.alphacoders.com/562/562451.jpg',    'https://images.alphacoders.com/562/562452.jpg',    'https://images2.alphacoders.com/101/1011957.jpg',    'https://images6.alphacoders.com/101/1011958.jpg',    'https://images5.alphacoders.com/101/1011959.jpg',    'https://images8.alphacoders.com/101/1011961.jpg',    'https://images3.alphacoders.com/692/692439.jpg',    'https://images4.alphacoders.com/940/940881.jpg',    'https://images5.alphacoders.com/689/689398.jpg',    'https://images5.alphacoders.com/757/757038.jpg',]time_path='time_compare.txt'#请求url,保存图片def save_pic(url,count):    # print(url)    # print('save_pic',threading.current_thread())    file_name = (str(count+1) + '.jpg' )    res = requests.get(url)    print(len(res.content)//1024//1024,   url)    with open(file_name,'wb') as f:        f.write(res.content)#单线程def single_download(url_list):    # print(threading.current_thread())    s_time=time.time()    for i in range(len(url_list)):        res=requests.get(url_list[i])        print(len(res.content)//1024//1024)        file_name=str(i+1)+'.jpg'        with open(file_name,'wb') as f:            f.write(res.content)    e_time=time.time()    t_time=e_time-s_time    # with open('single_download.txt','a') as f:    with open(time_path,'a') as f:        f.write('单线程总耗时:%r'%t_time+'\n'+'\n')    print('单线程总耗时:%r'%t_time)#多线程def thread_download(save_pic,url_list):    threads = []    start=time.time()    for i in range(len(url_list)):        #创建线程        t = Thread(target = save_pic, args = [url_list[i],i])        # t.setDaemon(True)        t.start()        threads.append(t)        #每个线程按顺序逐个执行        # t.join()    #多线程并发    # print('thread_download',threading.current_thread())    for t in threads:        t.join()    end = time.time()    print('多线程总耗时:%r' % (end-start))    # with open('thread_download.txt','a') as f:    with open(time_path,'a') as f:        f.write('多线程总耗时:%r'%(end - start)+'\n')#多进程def process_download(save_pic,url_list):    processes = []    start=time.time()    for i in range(len(url_list)):        #创建线程        p=multiprocessing.Process(target = save_pic, args = [url_list[i],i])        p.start()        processes.append(p)        #每个进程按顺序逐个执行        # p.join()    # 多进程并发    # print('process_download',threading.currentThread())    for p in processes:        p.join()    end = time.time()    print('多进程总耗时:%r' % (end-start))    # with open('thread_download.txt','a') as f:    with open(time_path,'a') as f:        f.write('多进程总耗时:%r'%(end - start)+'\n')if __name__ == '__main__':    thread_download(save_pic,python_list)    process_download(save_pic,python_list)    single_download(large_url_list)

耗时对比: 

多线程总耗时:22.477999925613403多进程总耗时:31.263000011444092单线程总耗时:25.10800004005432多线程总耗时:21.917999982833862多进程总耗时:28.180999994277954单线程总耗时:21.52900004386902多线程总耗时:6.33299994468689多进程总耗时:6.327999830245972单线程总耗时:21.680999994277954多线程总耗时:4.704999923706055多进程总耗时:7.363000154495239单线程总耗时:22.16599988937378多线程总耗时:4.493000030517578多进程总耗时:5.243000030517578单线程总耗时:20.289999961853027多线程总耗时:7.164999961853027多进程总耗时:6.3429999351501465单线程总耗时:40.97699999809265多线程总耗时:10.406000137329102多进程总耗时:11.692000150680542单线程总耗时:39.74600005149841多线程总耗时:11.069999933242798多进程总耗时:13.827999830245972单线程总耗时:55.35499978065491多线程总耗时:12.45300006866455多进程总耗时:15.381999969482422多线程总耗时:14.733000040054321多进程总耗时:17.787999868392944多线程总耗时:67.04800009727478多进程总耗时:65.76999998092651多线程总耗时:11.710999965667725多进程总耗时:13.263000011444092多线程总耗时:150.0369999408722多进程总耗时:87.61500000953674多线程总耗时:207.85199999809265多进程总耗时:85.44199991226196多线程总耗时:14.031000137329102多进程总耗时:16.914999961853027单线程总耗时:16.836000204086304多线程总耗时:16.92199993133545多进程总耗时:24.299000024795532单线程总耗时:20.825999975204468多线程总耗时:24.26200008392334多进程总耗时:25.591000080108643单线程总耗时:39.54299998283386多线程总耗时:42.15599989891052多进程总耗时:43.079999923706055多线程总耗时:45.169999837875366多进程总耗时:39.575000047683716多线程总耗时:50.48699998855591多进程总耗时:54.603999853134155多线程总耗时:55.680999994277954多进程总耗时:57.11299991607666多线程总耗时:51.34699988365173多线程总耗时:68.9359998703003多进程总耗时:60.924999952316284多线程总耗时:53.098999977111816多进程总耗时:55.61199998855591多线程总耗时:52.46000003814697多进程总耗时:51.26799988746643多线程总耗时:226.48599982261658多进程总耗时:211.4670000076294多线程总耗时:11.33299994468689多进程总耗时:15.307000160217285多线程总耗时:11.495000123977661多进程总耗时:11.54800009727478多线程总耗时:9.815999984741211多进程总耗时:10.997999906539917多线程总耗时:162.45900011062622多进程总耗时:180.01900005340576多线程总耗时:214.36699986457825多进程总耗时:157.90300011634827多线程总耗时:152.77100014686584多进程总耗时:136.43899989128113多线程总耗时:108.96199989318848多进程总耗时:104.80599999427795多线程总耗时:81.69500017166138多进程总耗时:82.85199999809265单线程总耗时:176.9119999408722
你可能感兴趣的文章
suse如何修改ssh端口为2222?
查看>>
详细理解“>/dev/null 2>&1”
查看>>
suse如何创建定时任务?
查看>>
suse搭建ftp服务器方法
查看>>
centos虚拟机设置共享文件夹并通过我的电脑访问[增加smbd端口修改]
查看>>
文件拷贝(IFileOperation::CopyItem)
查看>>
MapReduce的 Speculative Execution机制
查看>>
大数据学习之路------借助HDP SANDBOX开始学习
查看>>
Hadoop基础学习:基于Hortonworks HDP
查看>>
为什么linux安装程序 都要放到/usr/local目录下
查看>>
Hive安装前扫盲之Derby和Metastore
查看>>
永久修改PATH环境变量的几种办法
查看>>
大数据学习之HDP SANDBOX开始学习
查看>>
Hive Beeline使用
查看>>
Centos6安装图形界面(hdp不需要,hdp直接从github上下载数据即可)
查看>>
CentOS7 中把yum源更换成163源
查看>>
关于yum Error: Cannot retrieve repository metadata (repomd.xml) for repository:xxxxxx.
查看>>
linux下载github中的文件
查看>>
HDP Sandbox里面git clone不了数据(HTTP request failed)【目前还没解决,所以hive的练习先暂时搁置了】
查看>>
动态分区最佳实践(一定要注意实践场景)
查看>>