gpt4 book ai didi

batch file regex ids from html file(批处理文件来自html文件的regex ID)

转载 作者:bug小助手 更新时间:2023-10-25 16:23:33 28 4
gpt4 key购买 nike



I always do manually define list of IDs in list.txt file and creating a long string.
I want create same long string but to parse a html file and export IDs from links inside.

我总是在list.txt文件中手动定义ID列表,并创建一个长字符串。我想创建相同的长字符串,但要解析一个html文件,并从里面的链接导出ID。


Note: number of IDs in list.txt or list.html can be different

注意:List.txt或List.html中的ID数可以不同


// list.txt

//list.txt


450814997
463939057

// list.bat

//list.bat


set "my_directory=c:\server"
set "list="
for /F "tokens=*" %%A IN ('Type "list.txt"') do (
set "list=!list!%my_directory%\%%A;"
)
echo %list%

// list.html

//list.html


  <body>
<div class="mod-list">
<table>
<tr data-type="ModContainer">
<td data-type="DisplayName">CBA_A3</td>
<td>
<span class="from-steam">Steam</span>
</td>
<td>
<a href="https://steamcommunity.com/sharedfiles/filedetails/?id=450814997" data-type="Link">https://steamcommunity.com/sharedfiles/filedetails/?id=450814997</a>
</td>
</tr>
<tr data-type="ModContainer">
<td data-type="DisplayName">ace</td>
<td>
<span class="from-steam">Steam</span>
</td>
<td>
<a href="https://steamcommunity.com/sharedfiles/filedetails/?id=463939057" data-type="Link">https://steamcommunity.com/sharedfiles/filedetails/?id=463939057</a>
</td>
</tr>
</table>
</div>
<div class="dlc-list">
<table />
</div>
<div class="footer">
<span>Created by Arma 3 Launcher by Bohemia Interactive.</span>
</div>
</body>

expected output:

预期产出:


c:\server\450814997;c:\server\463939057;

更多回答

Your batch, as posted and assuming that you have invoked delayedexpansion should output c:\server\450814997;c:\server\463939057;

假设您已经调用了delayeExpansion,您的批处理应该输出c:\SERVER\450814997;c:\SERVER\463939057;

If your html file does not form part of your question, please Edit your post to remove it and its references. If it does please add sufficient information to explain its relevance to your specific problem.

如果您的html文件不是您问题的一部分,请编辑您的帖子以删除它及其引用。如果确实如此,请添加足够的信息来解释其与您的具体问题的相关性。

@Compo I posted html file as reference, what the structure looks like to get relevant ids from.

@Compo我发布了html文件作为参考,从那里获取相关ID的结构是什么样子的。

So as it has no bearing whatsoever on your question, I've removed it. If you wanted help with some code which takes the ID's directly from the html file instead of from the txt file, you should have posted your code attempt. This is not a free code writing service, where you post code which does something else, and we change it for you.

因此,由于它与您的问题没有任何关系,我已将其删除。如果您需要有关直接从html文件而不是从txt文件获取ID的代码的帮助,您应该已经发布了您的代码尝试。这不是一个免费的代码编写服务,在这里你发布做其他事情的代码,然后我们为你更改它。

Restating the question: This is how I've done it using list.txt. How do I do it using list.html?

重申这个问题:这就是我如何使用list.txt完成的。我如何使用list.html来完成这项工作?

优秀答案推荐

@ECHO OFF
SETLOCAL ENABLEDELAYEDEXPANSION
set "my_directory=c:\server"
set "list="
for /F "tokens=3delims=?" %%E IN ('Type "q77047266.txt"') do (
SET "token3=%%E"
IF DEFINED token3 set "list=!list!%my_directory%\!token3:~3,-4!;"
)
echo %list%

GOTO :EOF

I used a file named q77047266.txt containing your HTML data for my testing.

我在测试中使用了一个名为q77047266.txt的文件,其中包含您的HTML数据。


You don't specify whether the required string should be extracted from its first or second occurrence on the line. I chose the last.

您没有指定应该从该行的第一个或第二个匹配项中提取所需的字符串。我选了最后一个。


Using ? as a delimiter, grab the part of the line after the second ? (token3) then append the result to list,with the decoration but wiyjout the first 3 characters (id=) and last 3 characters ()

使用?作为分隔符,抓住第二行之后的那部分吗?(内标识3)然后将结果追加到列表中,带有装饰,但不包括前3个字符(id=)和最后3个字符()




I want create same long string but to parse a html file and export IDs from links inside.



Doing a quick search, that HTML is from https://atwar.online/arma/Arma_3_Preset_hoggit_no_jsrs.html, right?

快速搜索一下,那个超文本标记语言来自https://atwar.online/arma/Arma_3_Preset_hoggit_no_jsrs.html,,对吗?


To parse this HTML-source I'd highly recommend the XML/HTML/JSON parser .

要解析这个HTML源代码,我强烈推荐使用XML/HTML/JSON解析器xidel。


First the two <tr>-nodes you're after:

首先是你要找的两个节点:


xidel -s "https://atwar.online/arma/Arma_3_Preset_hoggit_no_jsrs.html"^
-e "//table/tbody/tr[td[@data-type='DisplayName']=('CBA_A3','ace')]"^
--output-node-format=xml --output-node-indent
<tr data-type="ModContainer">
<td data-type="DisplayName">CBA_A3</td>
<td>
<span class="from-steam">Steam</span>
</td>
<td>
<a href="http://steamcommunity.com/sharedfiles/filedetails/?id=450814997" data-type="Link">http://steamcommunity.com/sharedfiles/filedetails/?id=450814997</a>
</td>
</tr>
<tr data-type="ModContainer">
<td data-type="DisplayName">ace</td>
<td>
<span class="from-steam">Steam</span>
</td>
<td>
<a href="http://steamcommunity.com/sharedfiles/filedetails/?id=463939057" data-type="Link">http://steamcommunity.com/sharedfiles/filedetails/?id=463939057</a>
</td>
</tr>

xidel -s "https://atwar.online/arma/Arma_3_Preset_hoggit_no_jsrs.html"^
-e "//table/tbody/tr[td[@data-type='DisplayName']=('CBA_A3','ace')]/td/a/@href"
http://steamcommunity.com/sharedfiles/filedetails/?id=450814997
http://steamcommunity.com/sharedfiles/filedetails/?id=463939057

Next you can use request-decode() to retrieve the ids:

接下来,您可以使用REQUEST-DECODE()检索ID:


xidel -s "https://atwar.online/arma/Arma_3_Preset_hoggit_no_jsrs.html"^
-e "//table/tbody/tr[td[@data-type='DisplayName']=('CBA_A3','ace')]/td/a/request-decode(@href)"
{
"url": "http://steamcommunity.com/sharedfiles/filedetails/?id=450814997",
"protocol": "http",
"host": "steamcommunity.com",
"path": "sharedfiles/filedetails/",
"query": "id=450814997",
"params": {
"id": "450814997"
}
}
{
"url": "http://steamcommunity.com/sharedfiles/filedetails/?id=463939057",
"protocol": "http",
"host": "steamcommunity.com",
"path": "sharedfiles/filedetails/",
"query": "id=463939057",
"params": {
"id": "463939057"
}
}

xidel -s "https://atwar.online/arma/Arma_3_Preset_hoggit_no_jsrs.html"^
-e "//table/tbody/tr[td[@data-type='DisplayName']=('CBA_A3','ace')]/td/a/request-decode(@href)/params/id"
450814997
463939057

Then it's just a matter of creating the specific string you want. You can do this with concat() of course, or with XPath 4.0 String Templates (provided you're using an up-to-date Xidel binary):

然后,只需创建所需的特定字符串即可。当然,您可以使用conat()或XPath 4.0字符串模板(假设您使用的是最新的Xdel二进制文件):


xidel -s "https://atwar.online/arma/Arma_3_Preset_hoggit_no_jsrs.html"^
-e "//table/tbody/tr[td[@data-type='DisplayName']=('CBA_A3','ace')]/td/a/request-decode(@href)/params/concat('c:\server\',id,';')"
#or
xidel -s "https://atwar.online/arma/Arma_3_Preset_hoggit_no_jsrs.html"^
-e "//table/tbody/tr[td[@data-type='DisplayName']=('CBA_A3','ace')]/td/a/request-decode(@href)/params/`c:\server\{id};`"
c:\server\450814997;
c:\server\463939057;

And finally string-join() or --output-separator='' to put everything on a single line:

最后,字符串-联接()或--输出-分隔符=‘’将所有内容放在一行中:


xidel -s "https://atwar.online/arma/Arma_3_Preset_hoggit_no_jsrs.html"^
-e "string-join(//table/tbody/tr[td[@data-type='DisplayName']=('CBA_A3','ace')]/td/a/request-decode(@href)/params/`c:\server\{id};`)"
#or
xidel -s "https://atwar.online/arma/Arma_3_Preset_hoggit_no_jsrs.html"^
-e "//table/tbody/tr[td[@data-type='DisplayName']=('CBA_A3','ace')]/td/a/request-decode(@href)/params/`c:\server\{id};`"^
--output-separator=''
c:\server\450814997;c:\server\463939057;

If you want all ids, then simply remove the condition (between [ ]):

如果需要所有ID,只需删除条件(介于[]之间):


xidel -s "https://atwar.online/arma/Arma_3_Preset_hoggit_no_jsrs.html"^
-e "//table/tbody/tr/td/a/request-decode(@href)/params/`c:\server\{id};`"^
--output-separator=''

更多回答

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com