This is a follow up question about cleaning a SPARQL-dataset.
这是一个关于清理SPARQL数据集的后续问题。
The dataset is gained by this cell:
该数据集由该单元格获得:
#+name: raw-dataset
#+BEGIN_SRC sparql :url https://query.wikidata.org/sparql :format text/csv :cache yes :exports both
SELECT ?wLabel ?pLabel
WHERE
{
?p wdt:P31 wd:Q98270496 .
?p wdt:P1416 ?w .
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
ORDER BY ASC(?wLabel) ASC(?pLabel)
LIMIT 10
#+END_SRC
#+RESULTS[7981b64721a5ffc448aa7da773ce07ea8dbaf8ac]: raw-dataset
| wLabel | pLabel |
|-----------------------------------------------+--------------|
| Q105775472 | NFDI4Health |
| Q1117007 | NFDI4Health |
| Q115254989 | NFDI4Objects |
| Q1205424 | NFDI4Objects |
| Q17575706 | NFDI4Objects |
| Academy of Sciences and Humanities in Hamburg | Text+ |
| Academy of Sciences and Literature Mainz | NFDI4Culture |
| Academy of Sciences and Literature Mainz | NFDI4Memory |
| Academy of Sciences and Literature Mainz | NFDI4Objects |
| Academy of Sciences and Literature Mainz | Text+ |
As a second step I take a look at the data using zsh
:
作为第二步,我使用zsh查看数据:
#+begin_src sh :var data=raw-dataset :shebang "#!/opt/homebrew/bin/zsh" :exports both
echo ${data}
#+end_src
#+RESULTS:
| Q105775472 | NFDI4Health |
| Q1117007 | NFDI4Health |
| Q115254989 | NFDI4Objects |
| Q1205424 | NFDI4Objects |
| Q17575706 | NFDI4Objects |
| Academy of Sciences and Humanities in Hamburg | Text+ |
| Academy of Sciences and Literature Mainz | NFDI4Culture |
| Academy of Sciences and Literature Mainz | NFDI4Memory |
| Academy of Sciences and Literature Mainz | NFDI4Objects |
| Academy of Sciences and Literature Mainz | Text+ |
All fine and I can start with the cleaning part, getting rid of all lines containing Q....
:
一切正常,我可以从清理部分开始,去掉所有包含Q...的行:
#+begin_src sh :var data=raw-dataset :exports both :shebang "#!/opt/homebrew/bin/zsh"
echo ${data} | grep -L -E "Q[1-9]"
#+end_src
#+RESULTS:
Somehow using the -L
does not work.
But without (-L
) I get the results as expected from the code:
不知何故,使用-L并不起作用。但是如果没有(-L),我会从代码中获得预期的结果:
#+begin_src sh :var data=raw-dataset :shebang "#!/opt/homebrew/bin/zsh" :exports both
echo ${data} | grep -E "Q[1-9]"
#+end_src
#+RESULTS:
| Q105775472 | NFDI4Health |
| Q1117007 | NFDI4Health |
| Q115254989 | NFDI4Objects |
| Q1205424 | NFDI4Objects |
| Q17575706 | NFDI4Objects |
Question: Why is -L
not working and how can I get rid of the lines starting with Q....
?
问:为什么L不工作,我怎么才能去掉以Q开头的行...?
更多回答
Please edit your question and add your desired output (no description) for that sample input.
请编辑您的问题,并为该示例输入添加您所需的输出(无描述)。
Use grep -v
to exclude lines. grep -L
excludes entire files, which doesn't make sense in a pipeline.
使用grep-v可以排除行。Grep-L排除了整个文件,这在管道中是没有意义的。
Of course ..... do you want to turn this into an answer?
当然……你想把这个变成答案吗?
优秀答案推荐
To exclude lines beginning with Q and then a digit:
要排除以Q开头然后是数字的行,请执行以下操作:
... | grep -v '^Q[0-9]'
... | grep -Pv '^Q\d'
As mentioned in the comment, -L
is a rare commandline option (--files-without-match
), for excluding entire files. -v
is quite common, for excluding lines matching the pattern or patterns. The v
, as mentioned in man grep
is from --invert-match
正如评论中提到的,-L是一个罕见的命令行选项(--FILES-WITH-MATCH),用于排除整个文件。-v非常常见,用于排除与一个或多个模式匹配的行。如man grep中所述,v来自--invert-Match
更多回答
我是一名优秀的程序员,十分优秀!