python - 如何在目录中的所有 csvs 文件中执行 python 关键字搜索和单词计数器并写入单个 csv？-6ren

python - 如何在目录中的所有 csvs 文件中执行 python 关键字搜索和单词计数器并写入单个 csv？

转载作者：行者123 更新时间：2023-12-04 04:27:35

关闭。这个问题需要更多focused .它目前不接受答案。

想改善这个问题吗？更新问题，使其仅关注一个问题 editing this post .

4 个月前关闭。

Improve this question

我是 Python 新手并试图了解某些库。不确定如何将 csv 上传到 SO，但此脚本适用于任何 csv，只需替换“SwitchedProviders_TopicModel”
我的目标是遍历文件目录中的所有 csv - C:\Users\jj\Desktop\autotranscribe 并将我的 python 脚本输出按文件写入 csv。
所以让我们说例如我在上面的文件夹中有这些 csv 文件 -
'1003391793_1003391784_01bc7e411408166f7c5468f0.csv'
'1003478130_1003478103_8eef05b0820cf0ffe9a9754c.csv'
'1003478130_1003478103_8eef05b0820cf0ffe9a9882d.csv'
我希望我的 python 应用程序(下面)为文件夹/目录中的每个 csv 做一个字计数器，并将输出写入这样的数据帧 -

csvname                                            pre existing  exclusions  limitations  fourteen
1003391793_1003391784_01bc7e411408166f7c5468f0.csv    1           2           0            1

我的剧本 -

import pandas as pd
from collections import defaultdict

def search_multiple_strings_in_file(file_name, list_of_strings):
    """Get line from the file along with line numbers, which contains any string from the list"""
    line_number = 0
    list_of_results = []
    count = defaultdict(lambda: 0)
    # Open the file in read only mode
    with open("SwitchedProviders_TopicModel.csv", 'r') as read_obj:
        # Read all lines in the file one by one
        for line in read_obj:
            line_number += 1
            # For each line, check if line contains any string from the list of strings
            for string_to_search in list_of_strings:
                if string_to_search in line:
                    count[string_to_search] += line.count(string_to_search)
                    # If any string is found in line, then append that line along with line number in list
                    list_of_results.append((string_to_search, line_number, line.rstrip()))
 
    # Return list of tuples containing matched string, line numbers and lines where string is found
    return list_of_results, dict(count)


matched_lines, count = search_multiple_strings_in_file('SwitchedProviders_TopicModel.csv', [ 'pre existing ', 'exclusions','limitations','fourteen'])
    
df = pd.DataFrame.from_dict(count, orient='index').reset_index()
df.columns = ['Word', 'Count']

print(df)

我怎么能做到这一点？只查找计数器特定的单词，如您在我的脚本中看到的“十四”，而不是查找所有单词的计数器
csvs 之一的样本数据 - 信用用户 Umar H

df = pd.read_csv('1003478130_1003478103_8eef05b0820cf0ffe9a9754c.csv')
print(df.head(10).to_dict())
{'transcript': {0: 'hi thanks for calling ACCA  this is many speaking could have the pleasure speaking with ', 1: 'so ', 2: 'hi ', 3: 'I have the pleasure speaking with my name is B. as in boy E. V. D. N. ', 4: 'thanks yes and I think I have your account pulled up could you please verify your email ', 5: "sure is yeah it's on _ 00 ", 6: 'I T. O.com ', 7: 'thank you how can I help ', 8: 'all right I mean I do have an insurance with you guys I just want to cancel the insurance ', 9: 'sure I can help with that what was the reason for cancellation '}, 'confidence': {0: 0.73, 1: 0.18, 2: 0.88, 3: 0.72, 4: 0.83, 5: 0.76, 6: 0.83, 7: 0.98, 8: 0.89, 9: 0.95}, 'from': {0: 1.69, 1: 1.83, 2: 2.06, 3: 2.13, 4: 2.36, 5: 2.98, 6: 3.17, 7: 3.65, 8: 3.78, 9: 3.93}, 'to': {0: 1.83, 1: 2.06, 2: 2.13, 3: 2.36, 4: 2.98, 5: 3.17, 6: 3.65, 7: 3.78, 8: 3.93, 9: 4.14}, 'speaker': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0}, 'Negative': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0, 5: 0.0, 6: 0.0, 7: 0.0, 8: 0.116, 9: 0.0}, 'Neutral': {0: 0.694, 1: 1.0, 2: 1.0, 3: 0.802, 4: 0.603, 5: 0.471, 6: 1.0, 7: 0.366, 8: 0.809, 9: 0.643}, 'Positive': {0: 0.306, 1: 0.0, 2: 0.0, 3: 0.198, 4: 0.397, 5: 0.529, 6: 0.0, 7: 0.634, 8: 0.075, 9: 0.357}, 'compound': {0: 0.765, 1: 0.0, 2: 0.0, 3: 0.5719, 4: 0.7845, 5: 0.5423, 6: 0.0, 7: 0.6369, 8: -0.1779, 9: 0.6124}}

最佳答案

当您标记 Pandas 时，我们可以使用 .str.extractall搜索单词和行号。
您可以扩展函数并添加一些错误处理(例如如果给定的 csv 文件中不存在成绩单会发生什么)。

from pathlib import Path
import pandas as pd

def get_files_to_parse(start_dir : str) -> list:
    
    files = [f for f in Path(start_dir).glob('*.csv')]
    return files

def search_multiple_files(list_of_paths : list,key_words : list) -> pd.DataFrame:
    dfs = []
    for file in list_of_paths:
        df = pd.read_csv(file)
        word_df = df['transcript'].str.extractall(f"({'|'.join(key_words)})")\
                        .droplevel(1,0)\
                        .reset_index()\
                        .rename(columns={'index' : f"{file.parent}_{file.stem}")\
                        .set_index(0).T
        dfs.append(word_df)
    return pd.concat(dfs)

用法。
使用您的示例数据框(我从您的列表中添加了几个关键词)

files = get_files_to_parse('target\dir\folder')


[WindowsPath('1003478130_1003478103_8eef05b0820cf0ffe9a9754c.csv'),
 WindowsPath('1003478130_1003478103_8eef05b0820cf0ffe9a9754c_copy.csv')]

search_multiple_files(files,['pre existing', 'exclusions','limitations','fourteen'])

关于python - 如何在目录中的所有 csvs 文件中执行 python 关键字搜索和单词计数器并写入单个 csv？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/67287252/

文章推荐： javascript - 处理缓慢的 Electron 启动

javascript - 转换来自谷歌驱动器的所有 csvs
我正在尝试搜索我的谷歌驱动器中的所有 csvs，并将它们存储在具有电子表格格式的特定文件夹中。我已经通过名称对特定的 csv 进行了相同的尝试，但现在我想要所有的 csv，但我不知道该怎么做。 fu
ios - 通过电子邮件从我的应用程序发送多个递增的 csvs
在过去的一周里，我一直在绞尽脑汁思考这个问题，现在还想不出该怎么办。我目前有一个应用程序可以接收调查数据，并将其保存为 surveydata-mm-dd-yyyy 格式的 csv 文件。这通常适用于
bash - 将值输出到 csvs - 命令行
这个站点和一般编程的新手(背景是生物学家)。无论如何，我有一个任务是获取文本文件名、计算唯一行数、计算总行数并将其输出到 csv 文件中。这是我在 Cygwin 中使用的代码 #!/bin/bash
python-3.x - Pandas :将数据框拆分为多个 csvs
我有一个大文件，导入到 Pandas 的单个数据框中。我正在使用 pandas 按数据框中的行数将文件分成许多段。例如:10 行:文件 1 得到 [0:4]文件 2 得到 [5:9] 有没有一种方法
python - 如何通过循环将大量 csvs 导入不同的 python 数据帧？
我有一大堆 csv 文件。我想创建一个允许我执行此操作的循环； df_20180731 = pd.read_csv('path/cust_20180731.csv') 对于大约 36 个文件中
python - 如何在目录中的所有 csvs 文件中执行 python 关键字搜索和单词计数器并写入单个 csv？
关闭。这个问题需要更多focused .它目前不接受答案。想改善这个问题吗？更新问题，使其仅关注一个问题 editing this post . 4 个月前关闭。 Improve this ques
multithreading - 多色，多线程加速: reading through CSVs using TMemoryStream
我正在编写并行代码来枚举大量 CSV 文件，每个文件都包含历史股票数据(超过 6500 个代码)，并计算每只股票是否已达到历史最高点。我已经实现了一个线程池和 TThread 后代类，以在线程之间平
python - ModuleNotFoundError : No module named 'pandas.io.formats.csvs'
我正在尝试创建一个简单的 csv: dataframe.to_csv( psv_file_name, encoding='utf-8', header=True, sep="|
java - 将 .csvs 插入 Java 中的 SQLite 数据库
大家好。所以我有一组 csv 文件，我想将它们作为表插入到我的 Java 程序中的 sqlite 数据库中。我用谷歌搜索并搜索了 SO，但我找不到任何关于如何将这些 csv 文件插入 sqlite 数
python - 如何使用 Pandas concat 导入多个 csvs、分配变量并连接到一个 DataFrame 中？
我想优化下面的代码。它有效，但如果可以更简洁有效地完成，我想提出建议。 import os import glob import pandas as pd import numpy as np fil
python - 如何使用 Python pandas Df 将 csvs 与超过 1 个相同的列合并并仅添加不同的列
这道题类似于简单的mysql操作- UPDATE hpaai_month_div t, fahafa_monthly s SET t.col1=s.col1 WHERE t.col2=s.col2 A

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - 如何在目录中的所有 csvs 文件中执行 python 关键字搜索和单词计数器并写入单个 csv？