gpt4 book ai didi

Python MYSQL 数据插入两次

转载 作者:行者123 更新时间:2023-11-29 05:59:26 24 4
gpt4 key购买 nike

当在推文 URL 中找到单个哈希时,脚本会将值正确地插入到 MYSQL DB 中。当在推文 URL 中找到 2 个或更多哈希时,记录将在 MYSQL 数据库中 inserted 两次。

例如,如果一条推文有 2 个带有哈希值的 URL,则在 MYSQL DB 中创建 4 条记录。

数据库状态:

"https://www.virustotal.com/en/file/2819e520dea611c4dd1c3b1fd54adbd0c50963ff75d67cc7facbe2090574afc0/analysis/","2017-09-20 01:00:35","2819e520dea611c4dd1c3b1fd54adbd0c50963ff75d67cc7facbe2090574afc0"
"https://www.virustotal.com/en/file/8084880e875b4dc97ccd9f97249d4c7184f6be092679d2b272ece2890306ca89/analysis/","2017-09-20 01:03:35","8084880e875b4dc97ccd9f97249d4c7184f6be092679d2b272ece2890306ca89"
"https://www.virustotal.com/en/file/b5034183d4d2aca1e586b4a4bf22f32e4204c4b6d288c171d5252636c11248a0/analysis/","2017-09-20 01:03:35","8084880e875b4dc97ccd9f97249d4c7184f6be092679d2b272ece2890306ca89"
"https://www.virustotal.com/en/file/8084880e875b4dc97ccd9f97249d4c7184f6be092679d2b272ece2890306ca89/analysis/","2017-09-20 01:03:35","b5034183d4d2aca1e586b4a4bf22f32e4204c4b6d288c171d5252636c11248a0"
"https://www.virustotal.com/en/file/b5034183d4d2aca1e586b4a4bf22f32e4204c4b6d288c171d5252636c11248a0/analysis/","2017-09-20 01:03:35","b5034183d4d2aca1e586b4a4bf22f32e4204c4b6d288c171d5252636c11248a0"

关于如何只向数据库插入单个条目有什么建议吗?

#! /usr/bin/python

from __future__ import print_function
import tweepy
import json
import MySQLdb
import time
import json, urllib, urllib2, argparse, hashlib, re, sys
from dateutil import parser

WORDS = ['virustotal']

CONSUMER_KEY = "XXXX"
CONSUMER_SECRET = "YYY"
ACCESS_TOKEN = "AAAA"
ACCESS_TOKEN_SECRET = "DDDDD"


HOST = "192.168.150.1"
USER = "admin"
PASSWD = "admin"
DATABASE = "twitter"


def store_data(values, insert_time, insert_hash):
db=MySQLdb.connect(host=HOST, user=USER, passwd=PASSWD, db=DATABASE, charset="utf8")
cursor = db.cursor()
data = []
#print(hashes)
for value in values:
data.append((value, insert_time, insert_hash))
cursor.executemany("""INSERT INTO tweet_url VALUES (%s,%s,%s)""",data)
db.commit()
cursor.close()
db.close()
return

class StreamListener(tweepy.StreamListener):

def on_connect(self):
print("We are now connected to the streaming API.")

def on_error(self, status_code):
print('An Error has occured: ' + repr(status_code))
return False

def on_data(self, data):
try:
datajson = json.loads(data)
web_url= datajson['entities']['urls']
#print(web_url)
urls=[]
for i in web_url:
urls.append((i['expanded_url']))
values = [list([item]) for item in urls]
list_url = ','.join([str(i) for i in values])
extract_url=str(list_url)
formatted_url=''.join(extract_url)
sha256_hash=re.findall(r"([a-fA-F\d]{64})", formatted_url)
hashes=''.join(sha256_hash)
insert_time=time.strftime('%Y-%m-%d %H:%M:%S')
hash_list=re.findall(r"([a-fA-F\d]{64})", hashes)
for insert_hash in hash_list:
store_data(values, insert_time, insert_hash)
print(store_data)
print(hashes)
print(type(hashes))
except Exception as e:
print(e)



auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)
listener = StreamListener(api=tweepy.API(wait_on_rate_limit=True))
streamer = tweepy.Stream(auth=auth, listener=listener)
print("Tracking: " + str(WORDS))
streamer.filter(track=WORDS)

最佳答案

你有第一个循环:

for insert_hash in hash_list:
store_data(values, insert_time, insert_hash)

然后您再次循环这些值以构建元组的数据列表:

for value in values:
data.append((value, insert_time, insert_hash))

所以这些值被调用了两次。


也许你可以使用 zip()enumerate() 加入 hash_list 和之前的 values调用 store_data ?

data = []
if len(values) == len(hash_list):
for val,hash in zip(values, hash_list):
data.append((val, insert_time, hash))
store_data(data)

然后,无需在store_data()中再次循环,只需更改签名即可直接传递数据列表:

def store_data(data_list):
# connection to database
cursor.executemany("""INSERT INTO tweet_url VALUES (%s,%s,%s)""",data_list)

关于Python MYSQL 数据插入两次,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46308763/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com