gpt4 book ai didi

Excel 文件中的 Python NLTK

转载 作者:太空宇宙 更新时间:2023-11-03 19:26:26 25 4
gpt4 key购买 nike

我想分析 Excel 文件中的文本数据。我知道如何通过 Python 读取 Excel 文件,但每条数据都成为列表的一个值。但是,我想分析每个单元格中的文本。

这是我的 Excel 文件示例:

NAME    INDUSTRY        INFO    A       FINANCIAL       THIS COMPANY IS BLA BLA BLA B       MANUFACTURE     IT IS LALALALALALALALALA    C       FINANCIAL       THAT IS SOSOSOSOSOSOSOSO    D       AGRICULTURE     WHYWHYWHYWHYWHY 

I would like to analyze, say, the financial industry's company info using NLTK, such as the frequency of "IT".

This is what I have so far (yes, it doesn't work!):

import xlrd
aa='c:/book3.xls'
wb = xlrd.open_workbook(aa)
wb.sheet_names()
sh = wb.sheet_by_index(0)

for rownum in range(sh.nrows):
print nltk.word_tokenize(sh.row_values(rownum))

最佳答案

您正在将一行中的所有值传递给 word_tokenize,但您只对第三列中的内容感兴趣。您还正在处理标题行。试试这个:

import xlrd
book = xlrd.open_workbook("your_input_file.xls")
sheet = book.sheet_by_index(0)
for row_index in xrange(1, sheet.nrows): # skip heading row
name, industry, info = sheet.row_values(row_index, end_colx=3)
print "Row %d: name=%r industry=%r info=%r" %
(row_index + 1, name, industry, info)
print nltk.word_tokenize(info) # or whatever else you want to do

关于Excel 文件中的 Python NLTK,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/7943145/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com