gpt4 book ai didi

python - pandas.DataFrame.drop_duplicates(inplace=True) 抛出 'TypeError: unhashable type: ' dict''

转载 作者:太空宇宙 更新时间:2023-11-03 13:58:56 25 4
gpt4 key购买 nike

这是我的代码:

区 block 1

import requests
import pandas as pd

url = ('http://www.omdbapi.com/' '?apikey=ff21610b&t=social+network')
r = requests.get(url)
json_data = r.json()
# from app
print(json_data['Awards'])
json_dict = dict(json_data)
tab=""
# printing all data as Dictionary
print("JSON as Dictionary (all):\n")
for k,v in json_dict.items():
if len(k) > 6:
tab = "\t"
else:
tab = "\t\t"
print(str(k) + ":" + tab + str(v))
df = pd.DataFrame(json_dict)
df.drop_duplicates(inplace=True)
# printing Pandas DataFrame of all data
print("JSON as DataFrame (all):\n{}".format(df))

我刚刚在 DataCamp 上测试了一个示例问题。然后我去探索不同的东西。问题在 print(json_data['Awards']) 处停止。我走得更远,正在测试将 JSON 文件转换为字典并创建它的 pandas DataFrame。有趣的是,我的输出如下:

Won 3 Oscars. Another 165 wins & 168 nominations.
JSON as Dictionary (all):

Title: The Social Network
Year: 2010
Rated: PG-13
Released: 01 Oct 2010
Runtime: 120 min
Genre: Biography, Drama
Director: David Fincher
Writer: Aaron Sorkin (screenplay), Ben Mezrich (book)
Actors: Jesse Eisenberg, Rooney Mara, Bryan Barter, Dustin Fitzsimons
Plot: Harvard student Mark Zuckerberg creates the social networking site that would become known as Facebook, but is later sued by two brothers who claimed he stole their idea, and the co-founder who was later squeezed out of the business.
Language: English, French
Country: USA
Awards: Won 3 Oscars. Another 165 wins & 168 nominations.
Poster: https://m.media-amazon.com/images/M/MV5BMTM2ODk0NDAwMF5BMl5BanBnXkFtZTcwNTM1MDc2Mw@@._V1_SX300.jpg
Ratings: [{'Source': 'Internet Movie Database', 'Value': '7.7/10'}, {'Source': 'Rotten Tomatoes', 'Value': '96%'}, {'Source': 'Metacritic', 'Value': '95/100'}]
Metascore: 95
imdbRating: 7.7
imdbVotes: 542,658
imdbID: tt1285016
Type: movie
DVD: 11 Jan 2011
BoxOffice: $96,400,000
Production: Columbia Pictures
Website: http://www.thesocialnetwork-movie.com/
Response: True
Traceback (most recent call last):
File "C:\Users\rschosta\OneDrive - Incitec Pivot Limited\Documents\Data Science\omdb-api-test.py", line 20, in <module>
df.drop_duplicates(inplace=True)
File "C:\Users\rschosta\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py", line 3535, in drop_duplicates
duplicated = self.duplicated(subset, keep=keep)
File "C:\Users\rschosta\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py", line 3582, in duplicated
labels, shape = map(list, zip(*map(f, vals)))
File "C:\Users\rschosta\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py", line 3570, in f
vals, size_hint=min(len(self), _SIZE_HINT_LIMIT))
File "C:\Users\rschosta\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\algorithms.py", line 471, in factorize
labels = table.get_labels(values, uniques, 0, na_sentinel, check_nulls)
File "pandas/_libs/hashtable_class_helper.pxi", line 1367, in pandas._libs.hashtable.PyObjectHashTable.get_labels
TypeError: unhashable type: 'dict'

我正在对 .drop_duplicates() 进行一些研究,因为我之前使用过它并且它工作得很好。这是它工作正常的示例代码:

区 block 2

import pandas as pd
import numpy as np

#Create a DataFrame
d = {
'Name':['Alisa','Bobby','jodha','jack','raghu','Cathrine',
'Alisa','Bobby','kumar','Alisa','Alex','Cathrine'],
'Age':[26,24,23,22,23,24,26,24,22,23,24,24],

'Score':[85,63,55,74,31,77,85,63,42,62,89,77]}

df = pd.DataFrame(d,columns=['Name','Age','Score'])
print(df)
df.drop_duplicates(keep=False, inplace=True)
print(df)

注意这两段代码有一些不同。我在我的第一个脚本中将 numpy 作为 np 导入,它并没有改变结果。

关于如何使 drop_duplicates() 方法在 BLOCK 1 上工作有什么想法吗?

输出 block 1 - A

根据@Wen 的要求,这里是字典形式的数据:

{'Title': 'The Social Network', 'Year': '2010', 'Rated': 'PG-13', 'Released': '01 Oct 2010', 'Runtime': '120 min', 'Genre': 'Biography, Drama', 'Director': 'David Fincher', 'Writer': 'Aaron Sorkin (screenplay), Ben Mezrich (book)', 'Actors': 'Jesse Eisenberg, Rooney Mara, Bryan Barter, Dustin Fitzsimons', 'Plot': 'Harvard student Mark Zuckerberg creates the social networking site that would become known as Facebook, but is later sued by two brothers who claimed he stole their idea, and the co-founder who was later squeezed out of the business.', 'Language': 'English, French', 'Country': 'USA', 'Awards': 'Won 3 Oscars. Another 165 wins & 168 nominations.', 'Poster': 'https://m.media-amazon.com/images/M/MV5BMTM2ODk0NDAwMF5BMl5BanBnXkFtZTcwNTM1MDc2Mw@@._V1_SX300.jpg', 'Ratings': [{'Source': 'Internet Movie Database', 'Value': '7.7/10'}, {'Source': 'Rotten Tomatoes', 'Value': '96%'}, {'Source': 'Metacritic', 'Value': '95/100'}], 'Metascore': '95', 'imdbRating': '7.7', 'imdbVotes': '542,658', 'imdbID': 'tt1285016', 'Type': 'movie', 'DVD': '11 Jan 2011', 'BoxOffice': '$96,400,000', 'Production': 'Columbia Pictures', 'Website': 'http://www.thesocialnetwork-movie.com/', 'Response': 'True'}

现在我在删除重复项之前将 Ratings 字典转换为列时没有调用 .drop_duplicates() 方法,我在打印的表格列表中也有更多输出更容易阅读的字典:

Title:      The Social Network
Year: 2010
Rated: PG-13
Released: 01 Oct 2010
Runtime: 120 min
Genre: Biography, Drama
Director: David Fincher
Writer: Aaron Sorkin (screenplay), Ben Mezrich (book)
Actors: Jesse Eisenberg, Rooney Mara, Bryan Barter, Dustin Fitzsimons
Plot: Harvard student Mark Zuckerberg creates the social networking site that would become known as Facebook, but is later sued by two brothers who claimed he stole their idea, and the co-founder who was later squeezed out of the business.
Language: English, French
Country: USA
Awards: Won 3 Oscars. Another 165 wins & 168 nominations.
Poster: https://m.media-amazon.com/images/M/MV5BMTM2ODk0NDAwMF5BMl5BanBnXkFtZTcwNTM1MDc2Mw@@._V1_SX300.jpg
Ratings: [{'Source': 'Internet Movie Database', 'Value': '7.7/10'}, {'Source': 'Rotten Tomatoes', 'Value': '96%'}, {'Source': 'Metacritic', 'Value': '95/100'}]
Metascore: 95
imdbRating: 7.7
imdbVotes: 542,658
imdbID: tt1285016
Type: movie
DVD: 11 Jan 2011
BoxOffice: $96,400,000
Production: Columbia Pictures
Website: http://www.thesocialnetwork-movie.com/
Response: True

最佳答案

您有一个充满字典的Ratings 列。所以你不能使用 drop_duplicates 因为 dicts 是可变的而不是可散列的。

作为解决方案,您可以将这些值转换为元组的frozenset,然后使用drop_duplicates

df['Ratings'] = df.Ratings.transform(lambda k: frozenset(k.items()))
df.drop_duplicates()

或者仅选择要用作引用的列。例如,如果您只想删除基于 yeartitle 的重复项,您可以执行如下操作

ref_cols = ['Title', 'Year']
df.loc[~df[ref_cols].duplicated()]

关于python - pandas.DataFrame.drop_duplicates(inplace=True) 抛出 'TypeError: unhashable type: ' dict'',我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51623901/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com