gpt4 book ai didi

python - 如何在 Python 中构造 re.findall 正则表达式以捕获 Youtube 时间戳

转载 作者:行者123 更新时间:2023-12-04 17:13:09 26 4
gpt4 key购买 nike

脚本

from __future__ import unicode_literals
import youtube_dl
import pandas as pd
import csv
import re

# Initialize YouTube-DL Array
ydl_opts = {}

# read the csv file
number_of_rows = pd.read_csv('single.csv')

# Scrape Online Product
def run_scraper():

# Read CSV to List
with open("single.csv", "r") as f:
csv_reader = csv.reader(f)
next(csv_reader)

# Scrape Data From Store
for csv_line_entry in csv_reader:

with youtube_dl.YoutubeDL(ydl_opts) as ydl:
meta = ydl.extract_info(csv_line_entry[0], download=False)
description = meta['description']
#print('Description :', description)

# Function to Capture Timestamp Descriptions
get_links(description)


def get_links(description):

# Format: Timestamp + Text
description_text = re.findall(r'(\d{2}:\d{2}?.*)', description)
print(description_text)
print()

# Format: Text + Timestamp
description_text1 = re.findall(r'(.*\d{2}:\d{2}?)', description)
print(description_text1)

run_scraper()

CSV 文件

Videos, Format
https://www.youtube.com/watch?v=kqtD5dpn9C8, Format: Timestamp + Text
https://www.youtube.com/watch?v=pJ3IPRqiD2M, Format: Text + Timestamp
https://www.youtube.com/watch?v=rfscVS0vtbw, No Regex in code
https://www.youtube.com/watch?v=t8pPdKYpowI, No Regex in code

我的脚本从 CSV 文件中提取 YouTube 网址,以准备捕获一般的 YouTube 描述信息,例如介绍、链接、时间戳等。

我想仅捕获 YouTube 时间戳描述,如下图突出显示:
enter image description here

我了解 YouTube 时间戳格式不一致,因此我在 CSV 文件中包含了一些示例。

在我的函数 get_links 中,我已经部分成功地为列出的 4 个 CSV 网址中的 2 个提取了 Timestamp + TextText + Timestamp .

我需要一种方法来只显示时间戳的文本或描述部分,而不考虑所有 4 个 CSV 网址中显示的格式类型。

如有任何帮助,我们将不胜感激。

最佳答案

尝试:

import youtube_dl
import pandas as pd
import csv
import re

# Initialize YouTube-DL Array
ydl_opts = {}

r_pat = re.compile(r"\d+:\d+")
r_pat2 = re.compile(r"[^A-Za-z]*\d+:\d+:?\d*?[^A-Za-z]*")

# Scrape Online Product
def run_scraper():

# Read CSV to List
with open("single.csv", "r") as f:
csv_reader = csv.reader(f)
next(csv_reader)

# Scrape Data From Store
for csv_line_entry in csv_reader:
with youtube_dl.YoutubeDL(ydl_opts) as ydl:
meta = ydl.extract_info(csv_line_entry[0], download=False)
description = meta["description"]
out = get_links(description)
print(*out, sep="\n")
print("-" * 80)


def get_links(description):
rv = []
for line in description.splitlines():
if r_pat.search(line):
rv.append(r_pat2.sub("", line))
return rv


run_scraper()

打印:

[youtube] kqtD5dpn9C8: Downloading webpage
Introduction
What You Can Do With Python
Your First Python Program
Variables
Receiving Input
Type Conversion
Strings
Arithmetic Operators
Operator Precedence
Comparison Operators
Logical Operators
If Statements
Exercise
While Loops
Lists
List Methods
For Loops
The range() Function
Tuples
--------------------------------------------------------------------------------
[youtube] pJ3IPRqiD2M: Downloading webpage
Python Course
What is Python
Why choose Python
Features of Python
Applications of Python
Salary Trends
Quiz
Installing Python
Python Variable
Python Tokens


...and so on.

关于python - 如何在 Python 中构造 re.findall 正则表达式以捕获 Youtube 时间戳,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/69121425/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com