I have a Python script that reads a dataset, performs anomaly detection for various anomaly types (amount, coa, payee_name) on multiple organizations concurrently using concurrent.futures
. The script is divided into several functions, and each function is responsible for detecting anomalies of a specific type within an organization.
我有一个Python脚本,它读取数据集,同时使用concurrent.futures对多个组织上的各种异常类型(amount、coa、payee_name)执行异常检测。该脚本分为几个函数,每个函数负责检测组织内特定类型的异常。
The error is raised during the concatenation of anomaly DataFrames within the detect_anomalies_for_org
function, specifically at the line:
在detect_anomlies_for_org函数中连接异常数据帧的过程中,特别是在以下行中,会出现错误:
return pd.concat(anomalies_list)
the problem I am facing is:
我面临的问题是:
"ValueError: No objects to concatenate"
The code for the parallel processing is below:
并行处理的代码如下:
result_list = []
with concurrent.futures.ProcessPoolExecutor(max_workers=num_cpus) as executor:
futures = []
for Org_id in dataset['Org_id'].unique():
future = executor.submit(detect_anomalies_for_org, dataset.copy(), Org_id, anomaly_type_sets)
futures.append(future)
print(f"Submitted task for Org_id: {Org_id}")
concurrent.futures.wait(futures)
for future in futures:
result = future.result()
if not result.empty:
result_list.append(result)
# Concatenate non-empty DataFrames
if result_list:
merged_anomalies = pd.concat(result_list)
else:
merged_anomalies = pd.DataFrame(columns=['id', 'Txn_id', 'Org_id', 'Client_id', 'AnomalyType', 'AnomalyScore', 'AnomalyCategory'])
# Save the merged anomalies to a CSV file
merged_anomalies.to_csv(output_file_path, index=False)
elapsed_time = time.time() - start_time
print(f"Execution time: {elapsed_time:.2f} seconds")
The output is as follows:
输出如下:
Traceback (most recent call last):
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/concurrent/futures/process.py", line 246, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/tmp/ipykernel_10936/3957857038.py", line 177, in detect_anomalies_for_org
return pd.concat(anomalies_list)
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/pandas/core/reshape/concat.py", line 372, in concat
op = _Concatenator(
File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/pandas/core/reshape/concat.py", line 429, in __init__
raise ValueError("No objects to concatenate")
ValueError: No objects to concatenate
"""
The above exception was the direct cause of the following exception:
ValueError Traceback (most recent call last)
Cell In[3], line 202
199 concurrent.futures.wait(futures)
201 for future in futures:
--> 202 result = future.result()
203 if not result.empty:
204 result_list.append(result)
File ~/anaconda3/envs/python3/lib/python3.10/concurrent/futures/_base.py:451, in Future.result(self, timeout)
449 raise CancelledError()
450 elif self._state == FINISHED:
--> 451 return self.__get_result()
453 self._condition.wait(timeout)
455 if self._state in [CANCELLED, CANCELLED_AND_NOTIFIED]:
File ~/anaconda3/envs/python3/lib/python3.10/concurrent/futures/_base.py:403, in Future.__get_result(self)
401 if self._exception:
402 try:
--> 403 raise self._exception
404 finally:
405 # Break a reference cycle with the exception in self._exception
406 self = None
ValueError: No objects to concatenate
What I've Tried:
我尝试过的:
- I've verified that the input data (dataset) contains the expected values.
- I've reviewed the anomaly detection logic using Isolation Forest and checked intermediate results.
- I've attempted to handle edge cases where there might be no anomalies.
更多回答
Please post your code and output as code-formatted text. Images of code can not be accepted on Stack Overflow.
请发布您的代码并以代码格式文本输出。堆栈溢出上不能接受代码的图像。
I have made the changes hope my question is in the right format, also any idea as how how I can fix it?
我已经做出了改变,希望我的问题是正确的格式,也希望我能如何解决它?
I don´t see the code for the function that is failing. detect_anomalies_for_org
. Seriously, how do you expect the question to be answered? Anyway, what is likely happening is that you are using some global variable to concatenate your dataframes into, and expecting it to exist in the subprocess workers: global variables do exist in the subprocess, but each one will have its own copy.
我没有看到出现故障的函数的代码。detect_anomlies_for_org。说真的,你希望这个问题能得到怎样的回答?无论如何,可能发生的情况是,您正在使用一些全局变量将数据帧连接到中,并期望它存在于子流程工作者中:全局变量确实存在于子过程中,但每个变量都有自己的副本。
我是一名优秀的程序员,十分优秀!