gpt4 book ai didi

php - 如何使用从这个字符串中提取数据

转载 作者:搜寻专家 更新时间:2023-10-31 21:25:59 25 4
gpt4 key购买 nike

我不擅长写模式来提取数据。我有很长的文档,下面是我需要提取的具体字符串。

<p><span id="minPrice">XXXX<a href="YYYYY" target="_blank"><span>&yen;ZZZZZ</span></a></span>

我想提取XXXX, YYYY,ZZZZ值(value)。

我的第一步是获取 XXXX<a href="YYYYY" target="_blank"><span>&yen;ZZZZZ

$pattern = '/<p><span id="minPrice">^</span></a></span>/';
preg_match($pattern, $data, $matches);
echo ($matches[1]);

但它不起作用。那么如何提取XXXX, YYYY, and ZZZZ :(

我的文档充满了错误编码字符,因此我无法使用 loadHTML。它只是返回错误。

更新 1:所以我能够做到

        var_dump(libxml_use_internal_errors(true));
$DOM = new DOMDocument;
$DOM->loadHTML($data);
$items = $DOM->getElementById('minPrice');

$items 是

 DOMElement Object
(
[tagName] => span
[schemaTypeInfo] =>
[nodeName] => span
[nodeValue] => 最安価格(税込):¥131,649
[nodeType] => 1
[parentNode] => (object value omitted)
[childNodes] => (object value omitted)
[firstChild] => (object value omitted)
[lastChild] => (object value omitted)
[previousSibling] =>
[nextSibling] => (object value omitted)
[attributes] => (object value omitted)
[ownerDocument] => (object value omitted)
[namespaceURI] =>
[prefix] =>
[localName] => span
[baseURI] =>
[textContent] => 最安価格(税込):¥131,649
)

html是

<span id="minPrice">
�ň����i(�ō�)�F
<a href="http://kakaku.com/shop/1115/?pdid=K0000693648&lid=shop_itemview_saiyasukakaku" target="_blank">
<span>&yen;131,649</span>
</a>
</span>

如何提取 http://kakaku.com/shop/1115/?pdid=K0000693648&lid=shop_itemview_saiyasukakaku131,649

最佳答案

您可以使用以下代码行为 DOM 解析器启用内部错误处理:

libxml_use_internal_errors(true);

然后,您可以使用此示例代码访问所需的数据:

$html = <<<DATA
<p><span id="minPrice">最安価格(税込):<a href="http://kakaku.com/shop/1115/?pdid=K0000693648&lid=shop_itemview_saiyasukakaku" target="_blank"><span>&yen;131,649</span></a></span>
DATA;

$dom = new DOMDocument('1.0', 'UTF-8');
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

$xpath = new DOMXPath($dom);
$spans = $xpath->query('//span[@id="minPrice"]'); // Get all spans with ID=minPrice
$a = array();
foreach($spans as $span) {
foreach($span->childNodes as $child) { // Check the child nodes
if ($child->nodeName == "a") {
array_push($a, $child->getAttribute("href"));
}
}
array_push($a, preg_replace('~^.*?(\d+(?:,\d+)*)$~u', '$1', $child->nodeValue));
}

print_r($a);

结果:

Array
(
[0] => http://kakaku.com/shop/1115/?pdid=K0000693648&lid=shop_itemview_saiyasukakaku
[1] => 131,649
)

我使用正则表达式提取字符串末尾的数字,但您也可以使用带有日元符号的 explode

$num = explode(html_entity_decode("&yen;"), $child->nodeValue)[1];
array_push($a, $num);

参见 another demo

关于php - 如何使用从这个字符串中提取数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36079886/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com