gpt4 book ai didi

python - 使用正则表达式过滤列表时为 "TypeError: expected string or buffer"

转载 作者:太空宇宙 更新时间:2023-11-03 17:06:01 24 4
gpt4 key购买 nike

我目前正在尝试应用正则表达式,以便从包含链接的列表中过滤掉某些链接。

我现在尝试了多种方法,但总是收到此错误:

Traceback (most recent call last):
File "/Users/User/Documents/pyp/pushbullet_updater/DoDa/test.py", line 20, in <module>
print(get_chapter_links(links))
File "/Users/User/Documents/pyp/pushbullet_updater/DoDa/test.py", line 15, in get_chapter_links
match = re.findall(r"https://bluesilvertranslations\.wordpress\.com/\d{4}/\d{2}/\d{2}/douluo-dalu-\d{1,3}-\s*/", link)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/re.py", line 210, in findall
return _compile(pattern, flags).findall(string)
TypeError: expected string or buffer

我做错了什么?

代码如下:

import requests
from bs4 import BeautifulSoup
import re

#Gets chapter links
def get_chapter_links(index_url):
r = requests.get(index_url)
soup = BeautifulSoup(r.content, 'lxml')
links = soup.find_all('a')
url_list = []
for url in links:
url_list.append((url.get('href')))

for link in url_list: # Iterates through every line and looks for a match:
match = re.findall(r"https://bluesilvertranslations\.wordpress\.com/\d{4}/\d{2}/\d{2}/douluo-dalu-\d{1,3}-\s*/", link)
return match

links = 'https://bluesilvertranslations.wordpress.com/chapter-list/'

print(get_chapter_links(links))

最佳答案

来自re文档

re.findall(pattern, string, flags=0)
Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match.

New in version 1.5.2.

Changed in version 2.4: Added the optional flags argument.

注意:

  • 第一个参数应该是模式,第二个参数应该是字符串

修改后的代码:

import requests
from bs4 import BeautifulSoup
import re

#Gets chapter links
def get_chapter_links(index_url):
r = requests.get(index_url)
soup = BeautifulSoup(r.content, 'lxml')
links = soup.find_all('a')
url_list = []
for url in links:
url_list.append((url.get('href')))
match = [] # Create a list and append to it the matched links
for link in url_list: # Iterates through every line and looks for a match:
if link: # I have added this to see in there are value in link.
match += re.findall(r"https://bluesilvertranslations\.wordpress\.com/\d{4}/\d{2}/\d{2}/douluo-dalu-\d{1,3}-.*/", link) # I have changed the regex a bit since your did not match
return match

links = 'https://bluesilvertranslations.wordpress.com/chapter-list/'

print(get_chapter_links(links))

关于python - 使用正则表达式过滤列表时为 "TypeError: expected string or buffer",我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34585286/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com