gpt4 book ai didi

python - 将 SQLAlchemy 查询到 Pandas DF 时重复的列?

转载 作者:太空宇宙 更新时间:2023-11-04 05:49:48 27 4
gpt4 key购买 nike

我正在构建一个用于在星型模式数据库之上进行分析的 python 数据库,但由于数据框中的一些重复列键,我在集成 pandas 和 sqlalchemy 时遇到了问题。

这是类:

class Student(Base):
__tablename__ = 'DimStudent'

id = Column('StudentKey', Integer, primary_key=True)
srcstudentid = ('SrcStudentId', Integer)
firstname = Column('FirstName', String)
middlename = Column('MiddleName', String)
lastname = Column('LastName', String)
lep = Column('LimitedEnglishProficiency', String)
frl = Column('FreeReducedLunch', String)
sped = Column('SpecialEducation', String)

class School(Base):
__tablename__ = 'DimSchool'

id = Column('SchoolKey', Integer, primary_key=True)
name = Column('SchoolName', String)
district = Column('SchoolDistrict', String)
statecode = Column('StateCode', String)

class StudentScore(Base):
__tablename__ = 'FactStudentScore'

studentkey = Column('StudentKey', Integer, ForeignKey('DimStudent.StudentKey'), primary_key=True)
teacherkey = Column('TeacherKey', Integer, ForeignKey('DimTeacher.TeacherKey'), primary_key=True)
schoolkey = Column('SchoolKey', Integer, ForeignKey('DimSchool.SchoolKey'), primary_key = True)
assessmentkey = Column('AssessmentKey', Integer, ForeignKey('DimAssessment.AssessmentKey'), primary_key=True)
subjectkey = Column('SubjectKey', Integer, ForeignKey('DimSubject.SubjectKey'), primary_key=True)
yearcyclekey = Column('YearCycleKey', Integer, ForeignKey('DimYearCycle.YearCycleKey'), primary_key=True)
pointspossible = Column('PointsPossible', Integer)
pointsreceived = Column('PointsReceived', Integer)

student = relationship("Student", backref=backref('studentscore'))
school = relationship("School", backref=backref('studentscore'))
assessment = relationship("Assessment", backref='studentscore')
teacher = relationship("Teacher", backref='studentscore')
subject = relationship("Subject", backref='studentscore')
yearcycle = relationship("YearCycle", backref='studentscore')

每当我查询我的数据时,我总是会得到重复的数据列,例如,这个 ORM 调用中的学校键,然后从中构建一个数据框。

school = session.query(StudentScore, School, Subject)\    
.join(StudentScore.school).join(StudentScore.subject)\
.filter(School.name.like('%Dever%'))\
.filter(Subject.code == 'Math')

a = pd.read_sql(school.statement, school.session.bind)

SO thread提供了一种很好的转置技术来删除重复项。

 a = a.T.drop_duplicates().T

但是,当我在 IDE 变量资源管理器中与此数据框交互时,我仍然遇到错误。错误是:“重建索引仅对具有唯一值的索引对象有效”

知道问题出在哪里吗?

最佳答案

找到正确答案!而不是最简单的:

 a = a.T.drop_duplicates().T

我改为使用 groupby 来删除重复项:

df.T.groupby(level=0).first().T

也就是说,我不确定我最初的错误的驱动因素是什么。此外,新代码行的运行速度比旧代码行快 10-100 倍。

关于python - 将 SQLAlchemy 查询到 Pandas DF 时重复的列?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30789492/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com