gpt4 book ai didi

java - 从jsp中提取JSON作为字符串

转载 作者:行者123 更新时间:2023-12-02 13:27:04 25 4
gpt4 key购买 nike

我正在解析网站 View 源:https://massive.ucsd.edu/ProteoSAFe/datasets.jsp 。我想解析 .jsp 并从中提取 JSOn 对象。

我正在使用 Jsoup 提取数据

文档 doc = Jsoup.connect("https://massive.ucsd.edu/ProteoSAFe/datasets.jsp").maxBodySize(0).get();

然后使用Java模式将Json提取为字符串:

Pattern p = Pattern.compile(String.format("\"%s\":\\s*(.*),", "dataset","\"%s\":\\s*(.*),", "datasetNum","\"%s\":\\s*(.*),", "title","\"%s\":\\s*(.*),", "user","\"%s\":\\s*(.*),", "site","\"%s\":\\s*(.*),", "flowname","\"%s\":\\s*(.*),", "createdMillis","\"%s\":\\s*(.*),", "created","\"%s\":\\s*(.*),", "fileCount","\"%s\":\\s*(.*),", "fileSizeKB","\"%s\":\\s*(.*),", "psms","\"%s\":\\s*(.*),", "peptides","\"%s\":\\s*(.*),", "variants","\"%s\":\\s*(.*),", "proteins","\"%s\":\\s*(.*),", "species","\"%s\":\\s*(.*),", "instrument","\"%s\":\\s*(.*),", "modification","\"%s\":\\s*(.*),", "pi","\"%s\":\\s*(.*),", "complete","\"%s\":\\s*(.*),", "status","\"%s\":\\s*(.*),", "private","\"%s\":\\s*(.*),", "hash","\"%s\":\\s*(.*),", "px","\"%s\":\\s*(.*),", "task","\"%s\":\\s*(.*),", "id"));

Matcher m = p.matcher(script.html());

这样做时我收到错误。最后一行未正确解析。它最终被削减,所以我得到了

“JSONObject 文本必须在第 577 字符处以“}”结尾”错误。

任何人都可以建议我更好的方法来解析此页面以获取数据。

最佳答案

虽然用正则表达式解析任何 HTML 似乎是个坏主意。

这对我有用 Pattern.compile("(?s)var datasets = (\\[.*?\\]);")

(通过 Python 测试,因为这就是我可用的全部)。

它返回一个 JSONArray,而不是 JSONObject

关于java - 从jsp中提取JSON作为字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43354432/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com