gpt4 book ai didi

multithreading - foreach循环上的多线程?

转载 作者:行者123 更新时间:2023-12-04 08:26:25 27 4
gpt4 key购买 nike

我想处理一些数据。我的字典中有大约 25k 个项目。在 foreach 循环中,我查询数据库以获取有关该项目的结果。它们作为值添加到字典中。

foreach (KeyValuePair<string, Type> pair in allPeople)
{
MySqlCommand comd = new MySqlCommand("SELECT * FROM `logs` WHERE IP = '" + pair.Key + "' GROUP BY src", con);
MySqlDataReader reader2 = comd.ExecuteReader();
Dictionary<string, Dictionary<int, Log>> allViews = new Dictionary<string, Dictionary<int, Log>>();
while (reader2.Read())
{
if (!allViews.ContainsKey(reader2.GetString("src")))
{
allViews.Add(reader2.GetString("src"), reader2.GetInt32("time"));
}
}
reader2.Close();
reader2.Dispose();
allPeople[pair.Key].View = allViews;
}

我希望能够通过多线程更快地做到这一点。我有 8 个线程可用,CPU 使用率约为 13%。我只是不知道它是否会工作,因为它依赖于 MySQL 服务器。另一方面,也许 8 个线程会打开 8 个数据库连接,因此速度会更快。

无论如何,如果多线程对我有帮助,如何? o.O我从未使用过(多个)线程,所以任何帮助都会很棒:D

最佳答案

MySqlDataReader是有状态的 - 你调用 Read()在它上面并移动到下一行,因此每个线程都需要自己的阅读器,并且您需要编造一个查询,以便它们获得不同的值。这可能并不难,因为您自然会有许多具有不同 pair.Key 值的查询。

您还需要每个线程都有一个临时字典,然后合并它们,或者使用锁来防止字典的并发修改。

以上假设 MySQL 将允许单个连接执行并发查询;否则您可能也需要多个连接。

首先,我会看看如果您只向数据库询问您需要的数据( "SELECT src,time FROM 日志 WHERE IP = '" + pair.Key + "' GROUP BY src" )并使用 GetString(0) 和 GetInt32(1) 而不是使用名称来查找 src 会发生什么和时间;也只能从结果中获取一次值。

我也不确定逻辑 - 你没有按时间排序日志事件,所以哪个是第一个返回的(因此存储在字典中)可能是其中的任何一个。

类似这样的逻辑 - N 个线程中的每个线程只在第 N 对上运行,每个线程都有自己的读取器,实际上没有任何变化 allPeople , 只有 allPeople 中的值的属性:

    private void RunSubQuery(Dictionary<string, Type> allPeople, MySqlConnection con, int threadNumber, int threadCount)
{
int hoppity = 0; // used to hop over the keys not processed by this thread

foreach (var pair in allPeople)
{
// each of the (threadCount) threads only processes the (threadCount)th key
if ((hoppity % threadCount) == threadNumber)
{
// you may need con per thread, or it might be that you can share con; I don't know
MySqlCommand comd = new MySqlCommand("SELECT src,time FROM `logs` WHERE IP = '" + pair.Key + "' GROUP BY src", con);

using (MySqlDataReader reader = comd.ExecuteReader())
{
var allViews = new Dictionary<string, Dictionary<int, Log>>();

while (reader.Read())
{
string src = reader.GetString(0);
int time = reader.GetInt32(1);

// do whatever to allViews with src and time
}

// no thread will be modifying the same pair.Value, so this is safe
pair.Value.View = allViews;
}
}

++hoppity;
}
}

这未经测试——我在这台机器上没有 MySQL,也没有你的数据库和你正在使用的其他类型。它也是相当程序化的(有点像在 Fortran 中使用 OpenMPI 的方式),而不是将所有内容都包装在任务对象中。

您可以像这样启动线程:
    void RunQuery(Dictionary<string, Type> allPeople, MySqlConnection connection)
{
lock (allPeople)
{
const int threadCount = 8; // the number of threads

// if it takes 18 seconds currently and you're not at .net 4 yet, then you may as well create
// the threads here as any saving of using a pool will not matter against 18 seconds
//
// it could be more efficient to use a pool so that each thread takes a pair off of
// a queue, as doing it this way means that each thread has the same number of pairs to process,
// and some pairs might take longer than others
Thread[] threads = new Thread[threadCount];

for (int threadNumber = 0; threadNumber < threadCount; ++threadNumber)
{
threads[threadNumber] = new Thread(new ThreadStart(() => RunSubQuery(allPeople, connection, threadNumber, threadCount)));
threads[threadNumber].Start();
}

// wait for all threads to finish
for (int threadNumber = 0; threadNumber < threadCount; ++threadNumber)
{
threads[threadNumber].Join();
}
}
}

allPeople 上持有的额外锁已完成,以便在所有线程返回后存在写屏障;我不太确定是否需要它。任何物体都可以。

这并不能保证任何性能提升——可能是 MySQL 库是单线程的,但服务器当然可以处理多个连接。用不同数量的线程测量。

如果您使用的是 .net 4,那么您不必费力地创建线程或跳过您不处理的项目:
    // this time using .net 4 parallel; assumes that connection is thread safe
static void RunQuery(Dictionary<string, Type> allPeople, MySqlConnection connection)
{
Parallel.ForEach(allPeople, pair => RunPairQuery(pair, connection));
}

private static void RunPairQuery(KeyValuePair<string, Type> pair, MySqlConnection connection)
{
MySqlCommand comd = new MySqlCommand("SELECT src,time FROM `logs` WHERE IP = '" + pair.Key + "' GROUP BY src", connection);

using (MySqlDataReader reader = comd.ExecuteReader())
{
var allViews = new Dictionary<string, Dictionary<int, Log>>();

while (reader.Read())
{
string src = reader.GetString(0);
int time = reader.GetInt32(1);

// do whatever to allViews with src and time
}

// no iteration will be modifying the same pair.Value, so this is safe
pair.Value.View = allViews;
}
}

关于multithreading - foreach循环上的多线程?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/3186680/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com