batch file regex ids from html file(批处理文件来自html文件的regex ID)-6ren

batch file regex ids from html file(批处理文件来自html文件的regex ID)

转载作者：bug小助手更新时间：2023-10-25 16:23:33

I always do manually define list of IDs in list.txt file and creating a long string.
I want create same long string but to parse a html file and export IDs from links inside.

我总是在list.txt文件中手动定义ID列表，并创建一个长字符串。我想创建相同的长字符串，但要解析一个html文件，并从里面的链接导出ID。

Note: number of IDs in list.txt or list.html can be different

注意：List.txt或List.html中的ID数可以不同

// list.txt

//list.txt

450814997
463939057

// list.bat

//list.bat

set "my_directory=c:\server"
set "list="
for /F "tokens=*" %%A IN ('Type "list.txt"') do (
    set "list=!list!%my_directory%\%%A;"
)
echo %list%

// list.html

//list.html

  <body>
    <div class="mod-list">
      <table>
        <tr data-type="ModContainer">
          <td data-type="DisplayName">CBA_A3</td>
          <td>
            <span class="from-steam">Steam</span>
          </td>
          <td>
            <a href="https://steamcommunity.com/sharedfiles/filedetails/?id=450814997" data-type="Link">https://steamcommunity.com/sharedfiles/filedetails/?id=450814997</a>
          </td>
        </tr>
        <tr data-type="ModContainer">
          <td data-type="DisplayName">ace</td>
          <td>
            <span class="from-steam">Steam</span>
          </td>
          <td>
            <a href="https://steamcommunity.com/sharedfiles/filedetails/?id=463939057" data-type="Link">https://steamcommunity.com/sharedfiles/filedetails/?id=463939057</a>
          </td>
        </tr>
      </table>
    </div>
    <div class="dlc-list">
      <table />
    </div>
    <div class="footer">
      <span>Created by Arma 3 Launcher by Bohemia Interactive.</span>
    </div>
  </body>

expected output:

预期产出：

c:\server\450814997;c:\server\463939057;

更多回答

Your batch, as posted and assuming that you have invoked delayedexpansion should output c:\server\450814997;c:\server\463939057;

假设您已经调用了delayeExpansion，您的批处理应该输出c：\SERVER\450814997；c：\SERVER\463939057；

If your html file does not form part of your question, please Edit your post to remove it and its references. If it does please add sufficient information to explain its relevance to your specific problem.

如果您的html文件不是您问题的一部分，请编辑您的帖子以删除它及其引用。如果确实如此，请添加足够的信息来解释其与您的具体问题的相关性。

@Compo I posted html file as reference, what the structure looks like to get relevant ids from.

@Compo我发布了html文件作为参考，从那里获取相关ID的结构是什么样子的。

So as it has no bearing whatsoever on your question, I've removed it. If you wanted help with some code which takes the ID's directly from the html file instead of from the txt file, you should have posted your code attempt. This is not a free code writing service, where you post code which does something else, and we change it for you.

因此，由于它与您的问题没有任何关系，我已将其删除。如果您需要有关直接从html文件而不是从txt文件获取ID的代码的帮助，您应该已经发布了您的代码尝试。这不是一个免费的代码编写服务，在这里你发布做其他事情的代码，然后我们为你更改它。

Restating the question: This is how I've done it using list.txt. How do I do it using list.html?

重申这个问题：这就是我如何使用list.txt完成的。我如何使用list.html来完成这项工作？

优秀答案推荐

@ECHO OFF
SETLOCAL ENABLEDELAYEDEXPANSION 
set "my_directory=c:\server"
set "list="
for /F "tokens=3delims=?" %%E IN ('Type "q77047266.txt"') do (
 SET "token3=%%E"
 IF DEFINED token3 set "list=!list!%my_directory%\!token3:~3,-4!;"
)
echo %list%

GOTO :EOF

I used a file named q77047266.txt containing your HTML data for my testing.

我在测试中使用了一个名为q77047266.txt的文件，其中包含您的HTML数据。

You don't specify whether the required string should be extracted from its first or second occurrence on the line. I chose the last.

您没有指定应该从该行的第一个或第二个匹配项中提取所需的字符串。我选了最后一个。

Using ? as a delimiter, grab the part of the line after the second ? (token3) then append the result to list,with the decoration but wiyjout the first 3 characters (id=) and last 3 characters ()

使用？作为分隔符，抓住第二行之后的那部分吗？(内标识3)然后将结果追加到列表中，带有装饰，但不包括前3个字符(id=)和最后3个字符()

I want create same long string but to parse a html file and export IDs from links inside.

Doing a quick search, that HTML is from https://atwar.online/arma/Arma_3_Preset_hoggit_no_jsrs.html, right?

快速搜索一下，那个超文本标记语言来自https://atwar.online/arma/Arma_3_Preset_hoggit_no_jsrs.html，，对吗？

To parse this HTML-source I'd highly recommend the XML/HTML/JSON parser xidel.

要解析这个HTML源代码，我强烈推荐使用XML/HTML/JSON解析器xidel。

First the two <tr>-nodes you're after:

首先是你要找的两个节点：

xidel -s "https://atwar.online/arma/Arma_3_Preset_hoggit_no_jsrs.html"^
      -e "//table/tbody/tr[td[@data-type='DisplayName']=('CBA_A3','ace')]"^
      --output-node-format=xml --output-node-indent
<tr data-type="ModContainer">
  <td data-type="DisplayName">CBA_A3</td>
  <td>
    <span class="from-steam">Steam</span>
  </td>
  <td>
    <a href="http://steamcommunity.com/sharedfiles/filedetails/?id=450814997" data-type="Link">http://steamcommunity.com/sharedfiles/filedetails/?id=450814997</a>
  </td>
</tr>
<tr data-type="ModContainer">
  <td data-type="DisplayName">ace</td>
  <td>
    <span class="from-steam">Steam</span>
  </td>
  <td>
    <a href="http://steamcommunity.com/sharedfiles/filedetails/?id=463939057" data-type="Link">http://steamcommunity.com/sharedfiles/filedetails/?id=463939057</a>
  </td>
</tr>

xidel -s "https://atwar.online/arma/Arma_3_Preset_hoggit_no_jsrs.html"^
      -e "//table/tbody/tr[td[@data-type='DisplayName']=('CBA_A3','ace')]/td/a/@href"
http://steamcommunity.com/sharedfiles/filedetails/?id=450814997
http://steamcommunity.com/sharedfiles/filedetails/?id=463939057

Next you can use request-decode() to retrieve the ids:

接下来，您可以使用REQUEST-DECODE()检索ID：

xidel -s "https://atwar.online/arma/Arma_3_Preset_hoggit_no_jsrs.html"^
      -e "//table/tbody/tr[td[@data-type='DisplayName']=('CBA_A3','ace')]/td/a/request-decode(@href)"
{
  "url": "http://steamcommunity.com/sharedfiles/filedetails/?id=450814997",
  "protocol": "http",
  "host": "steamcommunity.com",
  "path": "sharedfiles/filedetails/",
  "query": "id=450814997",
  "params": {
    "id": "450814997"
  }
}
{
  "url": "http://steamcommunity.com/sharedfiles/filedetails/?id=463939057",
  "protocol": "http",
  "host": "steamcommunity.com",
  "path": "sharedfiles/filedetails/",
  "query": "id=463939057",
  "params": {
    "id": "463939057"
  }
}

xidel -s "https://atwar.online/arma/Arma_3_Preset_hoggit_no_jsrs.html"^
      -e "//table/tbody/tr[td[@data-type='DisplayName']=('CBA_A3','ace')]/td/a/request-decode(@href)/params/id"
450814997
463939057

Then it's just a matter of creating the specific string you want. You can do this with concat() of course, or with XPath 4.0 String Templates (provided you're using an up-to-date Xidel binary):

然后，只需创建所需的特定字符串即可。当然，您可以使用conat()或XPath 4.0字符串模板(假设您使用的是最新的Xdel二进制文件)：

xidel -s "https://atwar.online/arma/Arma_3_Preset_hoggit_no_jsrs.html"^
      -e "//table/tbody/tr[td[@data-type='DisplayName']=('CBA_A3','ace')]/td/a/request-decode(@href)/params/concat('c:\server\',id,';')"
#or
xidel -s "https://atwar.online/arma/Arma_3_Preset_hoggit_no_jsrs.html"^
      -e "//table/tbody/tr[td[@data-type='DisplayName']=('CBA_A3','ace')]/td/a/request-decode(@href)/params/`c:\server\{id};`"
c:\server\450814997;
c:\server\463939057;

And finally string-join() or --output-separator='' to put everything on a single line:

最后，字符串-联接()或--输出-分隔符=‘’将所有内容放在一行中：

xidel -s "https://atwar.online/arma/Arma_3_Preset_hoggit_no_jsrs.html"^
      -e "string-join(//table/tbody/tr[td[@data-type='DisplayName']=('CBA_A3','ace')]/td/a/request-decode(@href)/params/`c:\server\{id};`)"
#or
xidel -s "https://atwar.online/arma/Arma_3_Preset_hoggit_no_jsrs.html"^
      -e "//table/tbody/tr[td[@data-type='DisplayName']=('CBA_A3','ace')]/td/a/request-decode(@href)/params/`c:\server\{id};`"^
      --output-separator=''
c:\server\450814997;c:\server\463939057;

If you want all ids, then simply remove the condition (between [ ]):

如果需要所有ID，只需删除条件(介于[]之间)：

xidel -s "https://atwar.online/arma/Arma_3_Preset_hoggit_no_jsrs.html"^
      -e "//table/tbody/tr/td/a/request-decode(@href)/params/`c:\server\{id};`"^
      --output-separator=''

更多回答

文章推荐： Julia Differential Equations Repositories(Julia微分方程式资料库)

python - 为什么 id(id) 和 id(id(id)) 总是返回相同的值，而 id(id(id(id))) "loops"超过 3 个值？
出现在 python 2.7.8 中。 3.4.1 不会发生这种情况。示例: >>> id(id) 140117478913736 >>> id(id) 140117478913736 >>> id
javascript - ID、唯一 ID、客户端 ID、唯一客户端 ID、静态客户端 ID？
好吧，我对动态创建的控件的 ID 很困惑。 Public Class TestClass Inherits Panel Implements INamingContainer
java - Stackoverflow 和 Hibernate 使用 sql IN (id, id, id, id..id)
我收到下面的错误，说有堆栈溢出。发生这种情况是因为带有 IN (id, id, id...id) 的 SQL 语句有大量参数。有没有什么办法解决这一问题？这是在我使用 Eclipse 的本地环境中发生
python - 为什么 CPython 中 id({}) == id({}) 和 id([]) == id([]) ？
为什么 CPython(不知道其他 Python 实现)有以下行为？ tuple1 = () tuple2 = ()
python - 为什么 CPython 中的 id({}) == id({}) 和 id([]) == id([])？
为什么 CPython(对其他 Python 实现一无所知)有以下行为？ tuple1 = () tuple2 = ()
.net - 属性 'ID' : ID or Id? 的正确命名约定是什么
非常简单的问题:当我有一个持久对象时，它通常有一个名为 ID 的属性(对于抽象类)。那么..命名约定是ID还是Id？例如。 public int ID { get; set; } 或 public
java - ID 必须存在于容器中或作为生成的列，缺少 id : id
知道为什么我会收到此错误，我已经尝试了所有命名约定(小写/大写) 我正在使用 Vaadin，这是我的代码片段: public class Usercontainer extends BeanI
python - 为什么 id({}) == id({}) 和 id([]) == id([]) 在 CPython 中？
为什么 CPython(不知道其他 Python 实现)有以下行为？ tuple1 = () tuple2 = ()
sql - shift id's of table alike (id = id + 1) 其中 id 是主键
我需要改变表的所有主键 UPDATE TODO SET id = id + 1 但我做不到(Demo 来自 Ahmad Al-Mutawa 的回答)描述了原因。主键不能这样改。我也不能根据这是 sq
mysql - JOIN ids 以 0 作为父 id，查询具有不同父 id 的 id
我正在尝试列出与用户相关的讨论列表。想象一下，如果你愿意的话: posts -------------------------------------------------------------
php - Mysql group_concat(id) 作为左连接中的 ids 并使用 ids 选择 id 组中的所有列
我有一个表，其中包含一些具有自己的 ID 和共享 SKU key 的文章。我尝试使用左连接进行查询，并使用组结果获取从查询返回的所有 id。我的数据结构是这样的: id - name -
mysql - 为什么 `if(id=max(id), id, id+1)` 在 mysql 中没有按预期工作
在下表People中: id name 1 James 2 Yun 3 Ethan 如果我想找到最大 ID，我可以运行此查询 select max(id) id from People; 结果是
javascript - 如何通过单击子 ID 找到父 ID，然后通过 jquery 获取父 ID 来查找子 ID
我正在产品页面上创建评论模块，其中显示垃圾评论选项，并显示 onclick 显示和隐藏弹出窗口。现在它在单个评论中工作正常但是当评论是两个时它同时打开两个因为类是相同的。现在这就是为什么我想要获取父
c# - 如果实体没有 ID，是否可以让 NHibernate 自动生成 ID，或者如果实体已经有 ID，是否可以使用实体的 ID？
根据 REST 哲学，PUT操作应该(取自维基百科): PUT http://example.com/resources/142 Update the address member of the co
javascript - 如何以编程方式获取属性 ID、 View ID 和帐户 ID？
我想知道如何在使用 PHP 或 JavaScript 进行身份验证后从 Google Analytics 获取 Property Id、View Id 和 Account Id？因为我希望能够将它们存
javascript - 我想使用所选按钮的 id，但如何从中获取 id？ this.id 不起作用
我想使用所选按钮的 ID 进行删除。但我不知道如何从中获取/获取 id。我尝试了 this.id 但不起作用。这是我创建按钮的地方: var deleteEmployer= document.cre
php - 通过给定的 ID 获取所有相关的父 ID 和子 ID
我有一个具有以下结构的表“表” ID LinkedWith 12 13 13 12 14 13 15 14 16
sql - 您如何找到一条链的原始 ID、第一个 ID 和最后一个 ID？
请不要在未阅读问题的情况下将问题标记为重复。我确实发布了一个类似的问题，但 STACKOVERFLOW 社区成员要求我单独重新发布修改后的问题，因为考虑到一个小而微妙的修改，解决方案要复杂得多。假设
Java 类构造函数 this.id = id 或 this.setId(id)
在 Android Studio 中，我创建了一个 Person.java 类。我使用Generate 创建了getter 和setter 以及构造函数。这是我的 Person.java 类: pu
jquery - 显示#id - 当悬停另一个#id 时隐藏#id
如何在 jQuery 中制作这样的东西: //这是显示的主体 ID //当我悬停 #hover-id 时，我希望 #principal-id 消失并更改。但是当我将光标放在 #this-id 上时

bug小助手

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

batch file regex ids from html file(批处理文件来自html文件的regex ID)