gpt4 book ai didi

python - 属性错误 : 'dataframe' object has no attribute 'data_type'

转载 作者:行者123 更新时间:2023-12-05 05:49:34 27 4
gpt4 key购买 nike

我收到以下错误:attributeerror: 'dataframe' object has no attribute 'data_type'"。我正在尝试重新创建基于此 from this link 的代码 article我自己的数据集,与文章类似

from sklearn.model_selection import train_test_split

X_train, X_val, y_train, y_val = train_test_split(df.index.values,
df.label.values,
test_size=0.15,
random_state=42,
stratify=df.label.values)

df['data_type'] = ['not_set']*df.shape[0]

df.loc[X_train, 'data_type'] = 'train'
df.loc[X_val, 'data_type'] = 'val'

df.groupby(['Conference', 'label', 'data_type']).count()
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased',
do_lower_case=True)

encoded_data_train = tokenizer.batch_encode_plus(
df[df.data_type=='train'].example.values,
add_special_tokens=True,
return_attention_mask=True,
pad_to_max_length=True,
max_length=256,
return_tensors='pt'
)

这是我得到的错误:

---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_24180/2662883887.py in <module>
3
4 encoded_data_train = tokenizer.batch_encode_plus(
----> 5 df[df.data_type=='train'].example.values,
6 add_special_tokens=True,
7 return_attention_mask=True,

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
5485 ):
5486 return self[name]
-> 5487 return object.__getattribute__(self, name)
5488
5489 def __setattr__(self, name: str, value) -> None:

AttributeError: 'DataFrame' object has no attribute 'data_type'

我正在使用 python:3.9; torch :1.10.1; Pandas :1.3.5;变形金刚:4.15.0

最佳答案

该错误意味着您的数据框中没有 data_type 列,因为您错过了 this step

from sklearn.model_selection import train_test_split

X_train, X_val, y_train, y_val = train_test_split(df.index.values,
df.label.values,
test_size=0.15,
random_state=42,
stratify=df.label.values)

df['data_type'] = ['not_set']*df.shape[0] # <- HERE

df.loc[X_train, 'data_type'] = 'train' # <- HERE
df.loc[X_val, 'data_type'] = 'val' # <- HERE

df.groupby(['Conference', 'label', 'data_type']).count()

演示

  1. 设置
import pandas as pd
from sklearn.model_selection import train_test_split

# The Data
df = pd.read_csv('data/title_conference.csv')
df['label'] = pd.factorize(df['Conference'])[0]

# Train and Validation Split
X_train, X_val, y_train, y_val = train_test_split(df.index.values,
df.label.values,
test_size=0.15,
random_state=42,
stratify=df.label.values)

df['data_type'] = ['not_set']*df.shape[0]

df.loc[X_train, 'data_type'] = 'train'
df.loc[X_val, 'data_type'] = 'val'
  1. 代码
from transformers import BertTokenizer

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased',
do_lower_case=True)

encoded_data_train = tokenizer.batch_encode_plus(
df[df.data_type=='train'].Title.values,
add_special_tokens=True,
return_attention_mask=True,
pad_to_max_length=True,
max_length=256,
return_tensors='pt'
)

输出:

>>> encoded_data_train
{'input_ids': tensor([[ 101, 8144, 1999, ..., 0, 0, 0],
[ 101, 2152, 2836, ..., 0, 0, 0],
[ 101, 22454, 25806, ..., 0, 0, 0],
...,
[ 101, 1037, 2047, ..., 0, 0, 0],
[ 101, 13229, 7375, ..., 0, 0, 0],
[ 101, 2006, 1996, ..., 0, 0, 0]]), 'token_type_ids': tensor([[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, ..., 0, 0, 0],
[1, 1, 1, ..., 0, 0, 0],
[1, 1, 1, ..., 0, 0, 0],
...,
[1, 1, 1, ..., 0, 0, 0],
[1, 1, 1, ..., 0, 0, 0],
[1, 1, 1, ..., 0, 0, 0]])}

关于python - 属性错误 : 'dataframe' object has no attribute 'data_type' ,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/70649379/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com