gpt4 book ai didi

PHP 读取带有模式的 txt 文件并保留信息

转载 作者:行者123 更新时间:2023-11-29 06:16:34 26 4
gpt4 key购买 nike

抱歉,标题令人困惑,但我想不出另一个。

我有一个这种格式的文本文件(只是断章取义的几行):

# Google_Product_Taxonomy_Version: 2015-02-19
1 - Animals & Pet Supplies
3237 - Animals & Pet Supplies > Live Animals
2 - Animals & Pet Supplies > Pet Supplies
3 - Animals & Pet Supplies > Pet Supplies > Bird Supplies
7385 - Animals & Pet Supplies > Pet Supplies > Bird Supplies > Bird Cage Accessories
499954 - Animals & Pet Supplies > Pet Supplies > Bird Supplies > Bird Cage Accessories > Bird Cage Bird Baths
7386 - Animals & Pet Supplies > Pet Supplies > Bird Supplies > Bird Cage Accessories > Bird Cage Food & Water Dishes
4989 - Animals & Pet Supplies > Pet Supplies > Bird Supplies > Bird Cages & Stands
4990 - Animals & Pet Supplies > Pet Supplies > Bird Supplies > Bird Food

到目前为止,还不错。我想编写一个解析器,其中包含每个类别的所有信息。工作完成后,它必须写在一个mysql-DB中。

正好有:

1 unique ID
1 Main-category
n sub-categories

棘手的部分(对我而言)是如何保留这些信息并将它们保存在一个数组中,这与性能有关。

我的数据库必须有这样的最终输出

ID    | parent | title | 
1 | | Animals & Pet Supplies
3232 | 1 | Live Animals
2 | 1 | Pet Supplies
3 | 2 | Bird Supplies

事实上,我必须能够通过我的 DB 条目重现这个“面包屑”。

我的解析器是这样开始的:

public function enrichTaxonomy()
{
$aOutput = array();

// ignore first line
fgets($handle);

// iterate throug it
while (($line = fgets($handle)) !== false)
{
$splitted = explode("-", $line);

// build first level
if (strpos($splitted[1], '>') === false)
{
$aOutput['id'][] = trim($splitted[0]);
$aOutput['title'][] = trim($splitted[1]);
} else
{
// recursive?
if (substr_count($splitted[1], " > ") == 1)
{
$splitted2ndLevel = explode(" > ", $splitted[1]);
$aOutput['id'][] = trim($splitted[0]);
$aOutput['title'][] = trim($splitted2ndLevel[1]);
}
}
}

echo "<pre>";
var_dump($aOutput);
echo "</pre>";
}

但我意识到,这不是一个很好的方法,因为我的下一步应该是:

if (substr_count($splitted[1], " > ") == 2)
{
$splitted3rdLevel = explode(" > ", $splitted[1]);
$aOutput['id'][] = trim($splitted[0]);
$aOutput['title'][] = trim($splitted3rdLevel[2]);
}

if (substr_count($splitted[1], " > ") == 3)
{
$splitted4thLevel = explode(" > ", $splitted[1]);
$aOutput['id'][] = trim($splitted[0]);
$aOutput['title'][] = trim($splitted4thLevel[3]);
}

此外,这似乎在之后变得非常复杂,当我尝试拥有一个最终数组时,我可能会迭代该数组以将此数据插入我的数据库中。

重要的一点是,每个“子类别”都必须知道它的“父亲”,所以我也可以插入“父”-id。

我现在的问题:什么是实现此目标的良好、简短(相对)、高效的方法?

最佳答案

当您需要再次展平它以插入数据库时​​,无需构建树结构,而是创建与数据库相同的结构:

$input = <<<'EOD'
1 - Animals & Pet Supplies
3237 - Animals & Pet Supplies > Live Animals
2 - Animals & Pet Supplies > Pet Supplies
3 - Animals & Pet Supplies > Pet Supplies > Bird Supplies
7385 - Animals & Pet Supplies > Pet Supplies > Bird Supplies > Bird Cage Accessories
499954 - Animals & Pet Supplies > Pet Supplies > Bird Supplies > Bird Cage Accessories > Bird Cage Bird Baths
7386 - Animals & Pet Supplies > Pet Supplies > Bird Supplies > Bird Cage Accessories > Bird Cage Food & Water Dishes
4989 - Animals & Pet Supplies > Pet Supplies > Bird Supplies > Bird Cages & Stands
4990 - Animals & Pet Supplies > Pet Supplies > Bird Supplies > Bird Food
EOD;

$dbInput=[];

$lines = explode("\n", $input);
//or for a file, $lines = file('file.path', FILE_IGNORE_NEW_LINES);

foreach($lines as $line){
if(substr($line, 0, 1) == '#') continue;

list($id, $crumb) = explode('-', $line);
$id = trim($id);
$crumb_parts = array_map('trim',explode('>', $crumb));
$title = array_pop($crumb_parts);
$parent = array_pop($crumb_parts);
$parent_id = isset($dbInput[$parent])? $dbInput[$parent][':id'] : null;

$dbInput[$title] = [
':id' => $id,
':parent' => $parent_id,
':title' => $title,
];
}
$pdo = new PDO('mysql:host=localhost;dbname=dbname','usr','pass');

$sth = $pdo->prepare("INSERT INTO tree (id, parent, title) VALUES (:id, :parent, :title)");
foreach($dbInput as &$input){
$sth->execute($input);
}
echo 'done';

enter image description here

关于PHP 读取带有模式的 txt 文件并保留信息,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35581598/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com