gpt4 book ai didi

php - 如何在 PHP 中处理超过 100k 行的 CSV?

转载 作者:可可西里 更新时间:2023-11-01 07:49:37 25 4
gpt4 key购买 nike

我有一个超过 100.000 行的 CSV 文件,每行有 3 个值,用分号分隔。总文件大小约为。 5MB。

CSV 文件采用以下格式:

stock_id;product_id;amount
==========================
1;1234;0
1;1235;1
1;1236;0
...
2;1234;3
2;1235;2
2;1236;13
...
3;1234;0
3;1235;2
3;1236;0
...

我们有 10 只股票,在 CSV 中的索引为 1-10。在数据库中,我们将它们保存为 22-31。

CSV 按 stock_id、product_id 排序,但我认为这无关紧要。

我有什么

<?php

session_start();

require_once ('db.php');

echo '<meta charset="iso-8859-2">';

// convert table: `CSV stock id => DB stock id`
$stocks = array(
1 => 22,
2 => 23,
3 => 24,
4 => 25,
5 => 26,
6 => 27,
7 => 28,
8 => 29,
9 => 30,
10 => 31
);

$sql = $mysqli->query("SELECT product_id FROM table WHERE fielddef_id = 1");

while ($row = $sql->fetch_assoc()) {
$products[$row['product_id']] = 1;
}

$csv = file('export.csv');

// go thru CSV file and prepare SQL UPDATE query
foreach ($csv as $row) {
$data = explode(';', $row);
// $data[0] - stock_id
// $data[1] - product_id
// $data[2] - amount

if (isset($products[$data[1]])) {
// in CSV are products which aren't in database
// there is echo which should show me queries
echo " UPDATE t
SET value = " . (int)$data[2] . "
WHERE fielddef_id = " . (int)$stocks[$data[0]] . " AND
product_id = '" . $data[1] . "' -- product_id isn't just numeric
LIMIT 1<br>";
}
}

问题是用 echo 写下 100k 行太慢了,需要很长时间。我不确定 MySQL 会做什么,如果它会更快,或者花费 ± 相同的时间。我这里没有测试机,所以我担心在生产服务器上测试。

我的想法是将 CSV 文件加载到更多变量(更好的数组)中,如下所示,但我不知道为什么。

$csv[0] = lines 0      - 10.000;
$csv[1] = lines 10.001 - 20.000;
$csv[2] = lines 20.001 - 30.000;
$csv[3] = lines 30.001 - 40.000;
etc.

我发现例如。 Efficiently counting the number of lines of a text file. (200mb+) ,但我不确定它如何帮助我。

当我用 print_r 替换 foreach 时,我在 < 1 秒内得到了转储。任务是使带有数据库更新的 foreach 循环更快。

知道如何更新数据库中的这么多记录吗?
谢谢。

最佳答案

类似这样的东西(请注意这是 100% 未经测试的,我的头脑可能需要一些调整才能实际工作:))

//define array may (probably better ways of doing this
$stocks = array(
1 => 22,
2 => 23,
3 => 24,
4 => 25,
5 => 26,
6 => 27,
7 => 28,
8 => 29,
9 => 30,
10 => 31
);

$handle = fopen("file.csv", "r")); //open file
while (($data = fgetcsv($handle, 1000, ";")) !== FALSE) {
//loop through csv

$updatesql = "UPDATE t SET `value` = ".$data[2]." WHERE fielddef_id = ".$stocks[$data[0]]." AND product_id = ".$data[1];
echo "$updatesql<br>";//for debug only comment out on live
}

没有必要进行初始选择,因为您只是在代码中将产品数据设置为 1,并且从您的描述中可以看出,您的产品 ID 始终是正确的,它只是包含 map 的 fielddef 列.

另外,不要忘记将实际的 mysqli 执行命令放入 $updatesql 中;

给你一个与实际使用代码的比较(我可以做基准测试!)这是我用于上传文件导入器的一些代码(它并不完美,但它完成了它的工作)

if (isset($_POST['action']) && $_POST['action']=="beginimport") {
echo "<h4>Starting Import</h4><br />";
// Ignore user abort and expand time limit
//ignore_user_abort(true);
set_time_limit(60);
if (($handle = fopen($_FILES['clientimport']['tmp_name'], "r")) !== FALSE) {
$row = 0;
//defaults
$sitetype = 3;
$sitestatus = 1;
$startdate = "2013-01-01 00:00:00";
$enddate = "2013-12-31 23:59:59";
$createdby = 1;
//loop and insert
while (($data = fgetcsv($handle, 10000, ",")) !== FALSE) { // loop through each line of CSV. Returns array of that line each time so we can hard reference it if we want.
if ($row>0) {
if (strlen($data[1])>0) {
$clientshortcode = mysqli_real_escape_string($db->mysqli,trim(stripslashes($data[0])));
$sitename = mysqli_real_escape_string($db->mysqli,trim(stripslashes($data[0]))." ".trim(stripslashes($data[1])));
$address = mysqli_real_escape_string($db->mysqli,trim(stripslashes($data[1])).",".trim(stripslashes($data[2])).",".trim(stripslashes($data[3])));
$postcode = mysqli_real_escape_string($db->mysqli,trim(stripslashes($data[4])));
//look up client ID
$client = $db->queryUniqueObject("SELECT ID FROM tblclients WHERE ShortCode='$clientshortcode'",ENABLE_DEBUG);

if ($client->ID>0 && is_numeric($client->ID)) {
//got client ID so now check if site already exists we can trust the site name here since we only care about double matching against already imported sites.
$sitecount = $db->countOf("tblsites","SiteName='$sitename'");
if ($sitecount>0) {
//site exists
echo "<strong style=\"color:orange;\">SITE $sitename ALREADY EXISTS SKIPPING</strong><br />";
} else {
//site doesn't exist so do import
$db->execute("INSERT INTO tblsites (SiteName,SiteAddress,SitePostcode,SiteType,SiteStatus,CreatedBy,StartDate,EndDate,CompanyID) VALUES
('$sitename','$address','$postcode',$sitetype,$sitestatus,$createdby,'$startdate','$enddate',".$client->ID.")",ENABLE_DEBUG);
echo "IMPORTED - ".$data[0]." - ".$data[1]."<br />";
}
} else {
echo "<strong style=\"color:red;\">CLIENT $clientshortcode NOT FOUND PLEASE ENTER AND RE-IMPORT</strong><br />";
}
fcflush();
set_time_limit(60); // reset timer on loop
}
} else {
$row++;
}
}
echo "<br />COMPLETED<br />";
}
fclose($handle);
unlink($_FILES['clientimport']['tmp_name']);
echo "All Imports finished do not reload this page";
}

在大约 10 秒内导入了 15 万行

关于php - 如何在 PHP 中处理超过 100k 行的 CSV?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29794468/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com