终于升侠客了,发了50多个主题贴。一半是技术区,一半是新时代区。
感谢V大,在最后我发的一个技术贴给加了10分。
之前新时代区如何找图一直困扰着我,最后自己摸索出一点经验。特地在此分享一下。
有违规之处,请删除。
以一个国外网站Pornpics为例,介绍一下找图的流程。
前提:科学上网
1. 提取图集链接
Quote:
使用Python脚本提取图片链接,图集的要求是至少20章图片,且该图集和以前的没有重复,防止重复使用同一个图集
Quote:
#!/usr/bin/python3
import requests
from bs4 import BeautifulSoup
import json
import time
import logginglogger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
# create file handler which logs even debug messages
fh = logging.FileHandler('pronpic.log')
fh.setLevel(logging.DEBUG)
# create console handler with a higher log level
ch = logging.StreamHandler()
ch.setLevel(logging.WARNING)
# create formatter and add it to the handlers
formatter = logging.Formatter('%(asctime)s – %(message)s')
fh.setFormatter(formatter)
ch.setFormatter(formatter)
# add the handlers to the logger
logger.addHandler(fh)
logger.addHandler(ch)
logger.info('You can find this written in pornpic.log')proxies = {'http':'http://127.0.0.1:1080','https':'http://127.0.0.1:1080'}
class Random_PornPic():
def __init__(self, url):
self.url = url
self.history = self.load_local_his()
self.gallery_info = list()
# history字典格式,{"gallery_url":gallery_info}
# gallery_info格式 self.gallery_info = {'name':name,
# 'length':length,
# 'gallery_pics':gallery_pics
# }def parse_gallery(self,url):
if url == "":
return []
logger.info("Parsing the gallery")
try:r = requests.get(url,proxies=proxies,verify=False)
assert r.status_code == 200
soup = BeautifulSoup(r.text, 'lxml')
result = soup.select(".thumbwook > .rel-link")
result = [x['href'] for x in result]
return result
except Exception:
logger.info("Parse Gallary Failed")
return []# 检查重复
def check_useful(self, url,limit=20):
self.history = self.load_local_his()
if url in self.history:
return False
else:
gallery_pics = self.parse_gallery(url)
name = url.split('/')[-2]
length = len(gallery_pics)
self.gallery_info = {'name':name,
'length':length,
'gallery_pics':gallery_pics
}
# print(self.gallery_info)
self.history[url] = self.gallery_info
self.save_local_his()
if self.gallery_info['length'] >= limit:
logger.info(url + " ——- VALID")
return True
else:
return Falsedef load_local_his(self):
try:
with open('./history.json', 'r', encoding='utf-8') as f_toc:
data = json.load(f_toc)
except Exception:
data = dict()
return datadef save_local_his(self):
f_toc = open('./history.json', 'w', encoding='utf-8')
json.dump(self.history, f_toc, ensure_ascii=False, indent=4)
f_toc.close()def gen_gallery(self):
if self.check_useful(self.url):
logger.info("找到一个符合条件的相册")
with open(self.gallery_info['name'], 'w', encoding='utf-8') as f:
f.write('\n'.join(self.gallery_info['gallery_pics']))
logger.warning("保存一个相册成功")
print(self.gallery_info)
else:
print("可能有重复或者图片数量较少")
returnif __name__ == "__main__":
tmp_url = input("url is:\n")
Random_PornPic(tmp_url).gen_gallery()
运行上边的脚本,会要求输入想提取链接的网页网址。
如:https://www.pornpics.com/galleries/nubilesnet-atenas-andrade-90359301/
然后就会输入图集中的图片链接地址。
2. 上传图床
获得地址后,由于地址是外链,且是国外服务器的地址。国内可能无法访问,需要上传到图床。这里以https://www.privacypic.com/为例。
它可以直接输入链接来上传图片。
将第一步提取的图片链接复制到文本框后,上传。会得到图床的地址,选择BBCcode详细。
然后复制图床返回的地址。
3. 调整格式,起名字
每个图片地址之间都需要空一行
使用Notepad++文本编辑器来批量修改,其他文本编辑器也可以。
然后给图集起一个贴切的标题,就可以发布了。
——————
当然,可以将三个步骤合成为一个步骤,实现自动化。
上传图床可以通过API自动上传,(图床的API一般不开放,需要自己去分析,略麻烦。)然后可以直接得到三个符合要求的图集链接。(每日只可以发3贴)
———–
例如随机获取Pornpic的图集,筛选出图片数量在20个以上的图集。
Quote:
import requests
from bs4 import BeautifulSoup
import json
import time
import logginglogger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
# create file handler which logs even debug messages
fh = logging.FileHandler('pronpic.log')
fh.setLevel(logging.DEBUG)
# create console handler with a higher log level
ch = logging.StreamHandler()
ch.setLevel(logging.WARNING)
# create formatter and add it to the handlers
formatter = logging.Formatter('%(asctime)s – %(message)s')
fh.setFormatter(formatter)
ch.setFormatter(formatter)
# add the handlers to the logger
logger.addHandler(fh)
logger.addHandler(ch)
logger.info('You can find this written in pornpic.log')
# 设置代理
proxies = {'http':'http://127.0.0.1:1080','https':'http://127.0.0.1:1080'}class Random_PornPic():
def __init__(self, gallery_num):
self.number = gallery_num
self.history = self.load_local_his()
self.gallery_info = list()
# history字典格式,{"gallery_url":gallery_info}
# gallery_info格式 self.gallery_info = {'name':name,
# 'length':length,
# 'gallery_pics':gallery_pics
# }def parse_gallery(self,url):
if url == "":
return []
logger.info("Parsing the gallery")
try:
r = requests.get(url,proxies=proxies,verify=False)
assert r.status_code == 200
soup = BeautifulSoup(r.text, 'lxml')
result = soup.select(".thumbwook > .rel-link")
result = [x['href'] for x in result]
return result
except Exception:
logger.info("Parse Gallary Failed")
return []# 检查重复
def check_useful(self, url,limit=20):
self.history = self.load_local_his()
if url in self.history:
return False
else:
gallery_pics = self.parse_gallery(url)
name = url.split('/')[-2]
length = len(gallery_pics)
self.gallery_info = {'name':name,
'length':length,
'gallery_pics':gallery_pics
}
# print(self.gallery_info)
self.history[url] = self.gallery_info
self.save_local_his()
if self.gallery_info['length'] >= limit:
logger.info(url + " ——- VALID")
return True
else:
return Falsedef get_random_gallery(self):
try:
r = requests.get("https://www.pornpics.com/random/index.php",verify=False,timeout=30)
if r.status_code == 200:
url = r.json()['link']
logger.info("get random gallery: " + url)
time.sleep(30)
return url
except Exception:
logger.warning("Fail to get random gallery")
time.sleep(300)def load_local_his(self):
try:
with open('./history.json', 'r', encoding='utf-8') as f_toc:
data = json.load(f_toc)
except Exception:
data = dict()
return datadef save_local_his(self):
f_toc = open('./history.json', 'w', encoding='utf-8')
json.dump(self.history, f_toc, ensure_ascii=False, indent=4)
f_toc.close()def gen_gallery(self):
i = 0
while i < self.number:
url = self.get_random_gallery()
if self.check_useful(url):
logger.info("找到一个符合条件的相册")
with open(self.gallery_info['name'], 'w', encoding='utf-8') as f:
f.write('\n'.join(self.gallery_info['gallery_pics']))
logger.warning("保存一个相册成功")
i = i + 1
logger.info("生成了" + self.number + "个相册")
returnif __name__ == "__main__":
# 生成3个图集
Random_PornPic(3).gen_gallery()
[ 此貼被牛河在2020-06-05 08:16重新編輯 ]
BB姬
