gpt4 book ai didi

java - gawk或grep : single line and ungreedy

转载 作者:行者123 更新时间:2023-12-03 09:54:00 24 4
gpt4 key购买 nike

我想在所有具有两个以上类型参数(即下面示例中的*.java中的参数)的所有子目录中递归打印<R ... H>文件的标题。其中一个文件看起来像(为简洁起见,名称被简化):
多行代码。java

class ClazzA<R extends A,
S extends B<T>, T extends C<T>,
U extends D, W extends E,
X extends F, Y extends G, Z extends H>
extends OtherClazz<S> implements I<T> {

public void method(Type<Q, R> x) {
// ... code ...
}
}
具有预期的输出:
ClazzA.java:10: class ClazzA<R extends A,
ClazzA.java:11: S extends B<T>, T extends C<T>,
ClazzA.java:12: U extends D, W extends E,
ClazzA.java:13: X extends F, Y extends G, Z extends H>
ClazzA.java:14: extends OtherClazz<S> implements I<T> {
但是另一个也可能像这样:
单行.java
class ClazzB<R extends A, S extends B<T>, T extends C<T>, U extends D, W extends E, X extends F, Y extends G, Z extends H> extends OtherClazz<S> implements I<T> {

public void method(Type<Q, R> x) {
// ... code ...
}
}
具有预期的输出:
ClazzB.java:42: class ClazzB<R extends A, S extends B<T>, T extends C<T>, U extends D, W extends E, X extends F, Y extends G, Z extends H> extends OtherClazz<S> implements I<T> {
不应考虑/打印的文件:
X-no-parameter.java
class ClazzC /* no type parameter */ extends OtherClazz<S> implements I<T> {

public void method(Type<A, B> x) {
// ... code ...
}
}
X-one-parameter.java
class ClazzD<R extends A>  // only one type parameter
extends OtherClazz<S> implements I<T> {

public void method(Type<X, Y> x) {
// ... code ...
}
}
X-two-parameters.java
class ClazzE<R extends A, S extends B<T>>  // only two type parameters
extends OtherClazz<S> implements I<T> {

public void method(Type<X, Y> x) {
// ... code ...
}
}
X-two-line-parameters.java
class ClazzF<R extends A,  // only two type parameters
S extends B<T>> // on two lines
extends OtherClazz<S> implements I<T> {

public void method(Type<X, Y> x) {
// ... code ...
}
}
文件中的所有空格都可以是 \s+。紧邻 extends [...]implements [...]{是可选的。 extends [...]在每个类型参数上也是可选的。有关详细信息,请参见 The Java® Language Specification, 8.1. Class Declarations
我在Git Bash中使用 gawk:
$ gawk --version
GNU Awk 5.0.0, API: 2.0 (GNU MPFR 4.0.2, GNU MP 6.2.0)
和:
find . -type f -name '*.java' | xargs gawk -f ws-class-type-parameter.awk > ws-class-type-parameter.log
ws-class-type-parameter.awk:
# /start/ , /end/ ... pattern

#/class ClazzA<.*,.*/ , /{/ { # 5 lines, OK for ClazzA, but in real it prints classes with 2 or less type parameters, too
#/class ClazzA<.*,.*,/ , /{/ { # no line with ClazzA, since there's no second ',' on its first line
#/class ClazzA<.*,.*,/s , /{/ { # 500.000+(!) lines
#/class ClazzA<.*,.*,/s , /{/U { # 500.000+(!) lines
#/class ClazzA<.*,.*,/sU , /{/U { # 500.000+(!) lines
/(?s)class ClazzA<.*,.*,/ , /{/ { # no line

match( FILENAME, "/.*/.." )
print substr( FILENAME, RLENGTH ) ":" FNR ": " $0
}
这样可以找到所有的 *.java文件...很好,可以对每个文件都执行 gawk ...很好,但是在我尝试之后,您会看到结果作为注释。请注意: ClazzA文字仅用于测试,此处为 MCVE。它实际上可能是 \w+,但是在测试时在成千上万的文件中有500.000+行...
如果我在 regex101.com上尝试,它会起作用。好吧,有点。我没有找到如何在其中定义 /start-regex/,/end-regex/,因此在两者之间添加了另一个 .*
我从那里获取了标志,但是找不到 gawk是否支持标志语法 /.../sU , /.../U的描述,因此我尝试了一下。现在删除的注释告诉我,没有 awk风格支持此功能。
我也尝试了 grep:
$ grep --version
grep (GNU grep) 3.1
...
$ grep -nrPf types.grep *.java
使用 types.grep :
(?s).*class\s+\w+\s*<.*,.*,.*>.*{
结果仅输出 singleline.java。(?s)--perl-regexp, -P语法,并且grep --help声称支持此语法。
更新
埃德·莫顿(Ed Morton)的答案中的解决方案效果很好,但事实证明,存在使用以下方法自动生成的文件:
    /** more code before here */    
public void setId(String value) {
this.id = value;
}

/**
* Gets a map that contains attributes that aren't bound to any typed property on this class.
*
* <p>
* the map is keyed by the name of the attribute and
* the value is the string value of the attribute.
*
* the map returned by this method is live, and you can add new attribute
* by updating the map directly. Because of this design, there's no setter.
*
*
* @return
* always non-null
*/
public Map<QName, String> getOtherAttributes() {
return otherAttributes;
}
给出的输出例如:
AbstractAddressType.java:81:      * Gets a map that contains attributes that aren't bound to any typed property on this class.
AbstractAddressType.java:82: *
AbstractAddressType.java:83: * <p>
AbstractAddressType.java:84: * the map is keyed by the name of the attribute and
AbstractAddressType.java:85: * the value is the string value of the attribute.
AbstractAddressType.java:86: *
AbstractAddressType.java:87: * the map returned by this method is live, and you can add new attribute
AbstractAddressType.java:88: * by updating the map directly. Because of this design, there's no setter.
AbstractAddressType.java:89: *
AbstractAddressType.java:90: *
AbstractAddressType.java:91: * @return
AbstractAddressType.java:92: * always non-null
AbstractAddressType.java:93: */
AbstractAddressType.java:94: public Map<QName, String> getOtherAttributes() {
以及其他具有类注释和注解的对象,例如:
/**
* This class was generated by Apache CXF 3.3.4
* 2020-11-30T12:03:21.251+01:00
* Generated source version: 3.3.4
*
*/
@WebService(targetNamespace = "urn:SZRServices", name = "SZR")
@XmlSeeAlso({at.gv.egov.pvp1.ObjectFactory.class, org.w3._2001._04.xmldsig_more_.ObjectFactory.class, ObjectFactory.class, org.xmlsoap.schemas.ws._2002._04.secext.ObjectFactory.class, org.w3._2000._09.xmldsig_.ObjectFactory.class, at.gv.e_government.reference.namespace.persondata._20020228_.ObjectFactory.class})
public interface SZR {
// more code after here
输出例如
SZR.java:13:  * This class was generated by Apache CXF 3.3.4
SZR.java:14: * 2020-10-12T11:51:35.175+02:00
SZR.java:15: * Generated source version: 3.3.4
SZR.java:16: *
SZR.java:17: */
SZR.java:18: @WebService(targetNamespace = "urn:SZRServices", name = "SZR")
SZR.java:19: @XmlSeeAlso({at.gv.egov.pvp1.ObjectFactory.class, org.w3._2001._04.xmldsig_more_.ObjectFactory.class, ObjectFactory.class, org.xmlsoap.schemas.ws._2002._04.secext.ObjectFactory.class, org.w3._2000._09.xmldsig_.ObjectFactory.class, at.gv.e_government.reference.namespace.persondata._20020228_.ObjectFactory.class})

最佳答案

在每个UNIX框的任何shell中使用任何POSIX awk:

$ cat tst.awk
/[[:space:]]*class[[:space:]]*/ {
inDef = 1
fname = FILENAME
sub(".*/","",fname)
def = out = ""
}
inDef {
out = out fname ":" FNR ": " $0 ORS

# Remove comments (not perfect but should work for 99.9% of cases)
sub("//.*","")
gsub("/[*]|[*]/","\n")
gsub(/\n[^\n]*\n/,"")

def = def $0 ORS
if ( /{/ ) {
if ( gsub(/,/,"&",def) > 2 ) {
printf "%s", out
}
inDef = 0
}
}
$ find tmp -type f -name '*.java' -exec awk -f tst.awk {} +
multiple-lines.java:1: class ClazzA<R extends A,
multiple-lines.java:2: S extends B<T>, T extends C<T>,
multiple-lines.java:3: U extends D, W extends E,
multiple-lines.java:4: X extends F, Y extends G, Z extends H>
multiple-lines.java:5: extends OtherClazz<S> implements I<T> {
single-line.java:1: class ClazzB<R extends A, S extends B<T>, T extends C<T>, U extends D, W extends E, X extends F, Y extends G, Z extends H> extends OtherClazz<S> implements I<T> {
上面是使用此输入运行的:
$ head tmp/*
==> tmp/X-no-parameter.java <==
class ClazzC /* no type parameter */ extends OtherClazz<S> implements I<T> {

public void method(Type<A, B> x) {
// ... code ...
}
}

==> tmp/X-one-parameter.java <==
class ClazzD<R extends A> // only one type parameter
extends OtherClazz<S> implements I<T> {

public void method(Type<X, Y> x) {
// ... code ...
}
}

==> tmp/X-two-line-parameters.java <==
class ClazzF<R extends A, // only two type parameters
S extends B<T>> // on two lines
extends OtherClazz<S> implements I<T> {

public void method(Type<X, Y> x) {
// ... code ...
}
}

==> tmp/X-two-parameters.java <==
class ClazzE<R extends A, S extends B<T>> // only two type parameters
extends OtherClazz<S> implements I<T> {

public void method(Type<X, Y> x) {
// ... code ...
}
}

==> tmp/multiple-lines.java <==
class ClazzA<R extends A,
S extends B<T>, T extends C<T>,
U extends D, W extends E,
X extends F, Y extends G, Z extends H>
extends OtherClazz<S> implements I<T> {

public void method(Type<Q, R> x) {
// ... code ...
}
}

==> tmp/single-line.java <==
class ClazzB<R extends A, S extends B<T>, T extends C<T>, U extends D, W extends E, X extends F, Y extends G, Z extends H> extends OtherClazz<S> implements I<T> {

public void method(Type<Q, R> x) {
// ... code ...
}
}
以上只是尽力而为,无需编写用于该语言的解析器,而只是让OP张贴示例输入/输出以继续处理需要处理的内容。

关于java - gawk或grep : single line and ungreedy,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64936027/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com