- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我写了一个用于 flex 搜索的代码,其中我将movie_name命名为search_term,但是当它根据jaro winkler条件(即
for i in es_data:
if (i['_source']['entity_type'] == 'movie_entity'):
dist = distance.get_jaro_distance(search_term, i['_source']['entity_name'], winkler=True, scaling=0.1)
if dist > 0.80:
这段代码返回了正确的输出,但是当没有匹配项时,我得到一个错误。我尝试了其他语句,但错误仍在发生。
from..items import DeccanchronicleItem
import mysql.connector
from mysql.connector import Error
from mysql.connector import errorcode
from elasticsearch import Elasticsearch, RequestsHttpConnection
from requests_aws4auth import AWS4Auth
import boto3
import spacy
import fuzzy
from pyjarowinkler import distance
import json
import scrapy
import re
class DeccanchronicleSpider(scrapy.Spider):
name = 'a_review'
page_number = 2
start_urls = ['https://www.deccanchronicle.com/entertainment/movie-review?pg=1'
]
def parse(self, response):
items = {}
i = 1
movie_title = response.xpath('//*[@id="fullBody"]/div[4]/div[3]/div[1]/div[*]/div[2]/a/h3/text()').getall()
movie_text = response.xpath('//*[@id="fullBody"]/div[4]/div[3]/div[1]/div[*]/div[2]/a/div[1]/text()').getall()
movie_id = response.xpath('//*[@id="fullBody"]/div[4]/div[3]/div[1]/div[*]/div[2]/a/@href').getall()
items['movie_title'] = movie_title
items['movie_text'] = movie_text
items['movie_id'] = movie_id
li = items['movie_title']
for i in range(len(li)):
li_split = li[i].split(" ")
#print(movietitle)
if 'Review:' in li_split or 'review:' in li_split:
outputs = DeccanchronicleItem()
outputs['page_title'] = li[i]
outputs['review_content'] = items['movie_text'][i]
outputs['review_link'] = 'https://www.deccanchronicle.com' + str(items['movie_id'][i])
nlp = spacy.load('/Users/divyanshu/review_bot/review_bot/NER_model')
def actor_mid_ner(sentence):
doc = nlp(sentence)
detected_hash = {}
# detected_hash = { ent.label_ : ([ent.text] if ent.label_ is None else ) for ent in doc.ents}
for ent in doc.ents:
label = ent.label_
detected = detected_hash.keys()
omit = ['Unwanted']
if label not in omit:
if label not in detected:
detected_hash[label] = [ent.text]
else:
detected_hash[label].append(ent.text)
else:
detected_hash[label] = [ent.text]
return detected_hash, detected
sentence = outputs['page_title']
ner_hash, ner_keys = actor_mid_ner(sentence)
movie_name = " ".join(str(x) for x in ner_hash['MOVIE'] )
print('-----------------------------------')
print(movie_name)
print('-----------------------------------')
def elasticsearch(movie_name):
search_term = movie_name
host = 'xxxxxxxxxxxxxxx' # For example, my-test-domain.us-east-1.es.amazonaws.com
region = 'ap-southeast-1' # e.g. us-west-1
service = 'es'
credentials = boto3.Session().get_credentials()
awsauth = AWS4Auth(credentials.access_key, credentials.secret_key, region, service, session_token=credentials.token)
es = Elasticsearch(
hosts = [{'host': host, 'port': 443}],
http_auth = awsauth,
use_ssl = True,
verify_certs = True,
connection_class = RequestsHttpConnection
)
body = {
"query": {
"multi_match" : {
"query": search_term,
"fields": ["entity_name", "aka"],
"fuzziness": "AUTO"
}
}
}
res = es.search(index="production-widget_id_search", body=body)
es_data = res['hits']['hits']
# print(es_data)
for i in es_data:
if (i['_source']['entity_type'] == 'movie_entity'):
dist = distance.get_jaro_distance(search_term, i['_source']['entity_name'], winkler=True, scaling=0.1)
if dist > 0.80:
return (i['_source']['entity_id'], i['_source']['entity_name'])
movie_id , movie_name_es = elasticsearch(movie_name)
review_url = outputs['review_link']
print('-----------------------------------')
print(movie_id)
print('-----------------------------------')
print(movie_name)
print('-----------------------------------')
print(movie_name_es)
print('-----------------------------------')
print(review_url)
print('***********************************')
try:
connection = mysql.connector.connect(host='localhost',
database='review_url',
user='root',
password='admin')
mySql_insert_query = """INSERT INTO k_master_movie_reviews (id, title, title_es, url)
VALUES(%s,%s,%s,%s)""",(movie_id, movie_name, movie_name_es, review_url )
cursor = connection.cursor()
cursor.execute(mySql_insert_query)
connection.commit()
print(cursor.rowcount, "Record inserted successfully into table")
cursor.close()
except mysql.connector.Error as error:
print("Failed to insert record into table {}".format(error))
finally:
if (connection.is_connected()):
connection.close()
print("MySQL connection is closed")
outputs['id'] = movie_id
outputs['title'] = movie_name
outputs['title_es'] = movie_name_es
outputs['url'] = review_url
yield outputs
pass
next_page = 'https://www.deccanchronicle.com/entertainment/movie-review?pg=' + str(DeccanchronicleSpider.page_number)
if DeccanchronicleSpider.page_number <= 5:
DeccanchronicleSpider.page_number += 1
yield response.follow(next_page, callback = self.parse)
这是我得到的错误
Traceback (most recent call last):
File "/Users/divyanshu/env/lib/python3.7/site-packages/scrapy/utils/defer.py", line 117, in iter_errback
yield next(it)
File "/Users/divyanshu/env/lib/python3.7/site-packages/scrapy/utils/python.py", line 345, in __next__
return next(self.data)
File "/Users/divyanshu/env/lib/python3.7/site-packages/scrapy/utils/python.py", line 345, in __next__
return next(self.data)
File "/Users/divyanshu/env/lib/python3.7/site-packages/scrapy/core/spidermw.py", line 64, in _evaluate_iterable
for r in iterable:
File "/Users/divyanshu/env/lib/python3.7/site-packages/scrapy/spidermiddlewares/offsite.py", line 29, in process_spider_output
for x in result:
File "/Users/divyanshu/env/lib/python3.7/site-packages/scrapy/core/spidermw.py", line 64, in _evaluate_iterable
for r in iterable:
File "/Users/divyanshu/env/lib/python3.7/site-packages/scrapy/spidermiddlewares/referer.py", line 338, in <genexpr>
return (_set_referer(r) for r in result or ())
File "/Users/divyanshu/env/lib/python3.7/site-packages/scrapy/core/spidermw.py", line 64, in _evaluate_iterable
for r in iterable:
File "/Users/divyanshu/env/lib/python3.7/site-packages/scrapy/spidermiddlewares/urllength.py", line 37, in <genexpr>
return (r for r in result or () if _filter(r))
File "/Users/divyanshu/env/lib/python3.7/site-packages/scrapy/core/spidermw.py", line 64, in _evaluate_iterable
for r in iterable:
File "/Users/divyanshu/env/lib/python3.7/site-packages/scrapy/spidermiddlewares/depth.py", line 58, in <genexpr>
return (r for r in result or () if _filter(r))
File "/Users/divyanshu/env/lib/python3.7/site-packages/scrapy/core/spidermw.py", line 64, in _evaluate_iterable
for r in iterable:
File "/Users/divyanshu/review_bot/review_bot/spiders/a.py", line 515, in parse
movie_id , movie_name_es = elasticsearch(movie_name)
TypeError: cannot unpack non-iterable NoneType object
最佳答案
这是因为在没有匹配项时,您的elasticsearch()
函数将返回None
,然后您立即将其解压缩为movie_id
和movie_name_es
。我建议在return (None, None)
函数的末尾添加elasticsearch()
。
关于python - Elasticsearch查询未返回正确的响应,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62570817/
我正在尝试检查 Entry 中是否存在重复项,并使用内联消息提醒用户该数字存在。 $(document).ready(function(){ $("#con1").blur(function(
我有一个基于类的 View 。我在引导模式上使用 Ajax。为了避免页面刷新,我想使用此类基于 View 返回 JSON 响应而不是 HTTP 响应,但我只看到了如何为基于函数的 View 返回 JS
关闭。这个问题是not reproducible or was caused by typos .它目前不接受答案。 这个问题是由于错别字或无法再重现的问题引起的。虽然类似的问题可能是on-topi
我有一个大型内部企业基于 Web 的应用程序在 IIS6 上运行 ASP.NET 3.5,生成 401 个“未经授权”响应,然后是 200 个“Ok”响应(如 Fiddler 所述)。我知道为什么会发
感谢您研究我的问题。 我有一个node/express服务器,配置了一个server.js文件,它调用urls.js,而urls.js又调用 Controller 来处理http请求,所有这些都配置相
当我使用以下命令时,我得到正确的 JSON 响应: $ curl --data "regno=&dob=&mobile=" https://vitacademics-rel.herokuapp.co
我有一个非常简单的 RESTful 服务,它通过 POST 接收一些表单数据,其目的是在云存储(Amazon S3、Azure Blob 存储等)中简单地保留文本主体(具有唯一 ID)作为一个文件..
UDP 不发送任何 ack,但它会发送任何响应吗? 我已经设置了客户端服务器UDP程序。如果我让客户端向不存在的服务器发送数据,那么客户端会收到任何响应吗? 我的假设是; 客户端 --> 广播服务器地
我有一个电梯项目,其中 有一个扩展 RestHelper 的类,看起来像这样 serve{ "api" / "mystuff" prefix { case a
我们正在寻求覆盖 Kong 错误响应结构并编写自定义消息(即用我们的自定义消息替换“超出 API 速率限制”、“无效的身份验证凭据”等)。 我们要找的错误响应结构(代码是自定义的内部错误代码,与HTT
我正在尝试监听 EKEventStoreChangedNotification 以检查当我的应用程序处于后台时日历是否已更改。 我在 View Controller 的 initWithNibMeth
我了解 javascript,并且正在学习 ASP.NET C# 我想要做什么(完成的是javascript): document.getElementById('divID-1'
是否可以过滤所有 har 对象并仅获取 POST 请求/响应?也许在初始化 BrowserMobProxyServer 期间是这样做的方法?我需要将 har 对象保存到文件中并上传到 har 查看器。
我正在尝试向 Oauth 的 API 发送响应。遗憾的是,Symfony2 文档在解释 $response->headers->set(...); 的所有不同部分方面做得很差。 这是我的 OauthC
我正在尝试测试用例来模拟 api 调用,并使用 python 响应来模拟 api 调用。 下面是我的模拟, with responses.RequestsMock() as rsps: url
在尝试在 Haskell 中进行一些领域驱动设计时,我发现自己遇到了这个问题: data FetchAccessories = FetchAccessories data AccessoriesRes
我正在与 ANT+ USB 棒连接,并用项目 react 器替换我自己天真的“MessageBus”,因为它看起来非常合适。 USB接口(interface)本质上是异步的(单独的输入/输出管道),我
我正在将项目迁移到AFNetworking 2.0。使用AFNetworking 1.0时,我编写了代码来记录控制台中的每个请求/响应。这是代码: -(AFHTTPRequestOperation *
我有以下代码段。 ajaxRequest.onreadystatechange = function(){ if(ajaxRequest.readyState == 4){
我有问题......我在 php 中有一个监听器脚本可以执行以下操作: if ($count != 1) {echo 'no';} else { echo "yes";} 因此它会回显"is"或“
我是一名优秀的程序员,十分优秀!