gpt4 book ai didi

python - 使用 Praw 抓取 subreddit 帖子标题并将其用作文件名

转载 作者:行者123 更新时间:2023-12-01 08:00:44 26 4
gpt4 key购买 nike

我的代码当前从给定的 Reddit 子版下载图像,并将它们命名为原始文件名。我希望代码做的是将它们命名为 Reddit 上发布的名称。有人可以帮我吗?我认为这与 Submission.title 有关,但我无法弄清楚。干杯。

import praw
import threading
from requests import get
from multiprocessing.pool import ThreadPool
import os


client_id = 'xxxxxxxxx'
client_secret = 'xxxxxxxxx'
user_agent = 'xxxxxxxxx'
image_directory = 'images'
thread_count = 16

target_subreddit = 'space'
image_count = '10'
order = 'hot'

order = order.lower()

reddit = praw.Reddit(client_id=client_id,
client_secret=client_secret, user_agent=user_agent)


def get_order():
if order == 'hot':
ready = reddit.subreddit(target_subreddit).hot(limit=None)
elif order == 'top':
ready = reddit.subreddit(target_subreddit).top(limit=None)
elif order == 'new':
ready = reddit.subreddit(target_subreddit).new(limit=None)
return ready


def get_img(what):
image = '{}/{}/{}'.format(image_directory,
target_subreddit, what.split('/')[-1])
img = get(what).content
with open(image, 'wb') as f:
f.write(img)


def make_dir():
directory = f'{image_directory}/{target_subreddit}'
if not os.path.exists(directory):
os.makedirs(directory)


def main():
c = 1
images = []
make_dir()
for submission in get_order():
url = submission.url
if url.endswith(('.jpg', '.png', '.gif', '.jpeg')):
images.append(url)
c += 1
if int(image_count) < c:
break

results = ThreadPool(thread_count).imap_unordered(get_img, images)
for path in results:
pass

print('Done')

if __name__ == '__main__':
main()

最佳答案

是的,所以如果您的“url”变量为您提供了正确的网址,那么只需submission.title 即可为您提供标题。您可能会遇到编码问题,因此您可能需要使用 str() 进行转换,或者获取一点 fancier with the encode function.另外,许多文件名中不允许使用某些字符,因此也许可以尝试从标题中删除不允许的字符。

关于python - 使用 Praw 抓取 subreddit 帖子标题并将其用作文件名,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55747537/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com