正则表达式帮助 : Identifying websites in text-6ren

正则表达式帮助 : Identifying websites in text

转载作者：行者123 更新时间：2023-12-01 06:54:43

25

4

我正在尝试编写一个从一段文本中删除网站的函数。我有:

removeWebsites<- function(text){
  text = gsub("(http://|https://|www.)[[:alnum:]~!#$%&+-=?,:/;._]*",'',text)
  return(text)
}

这处理了大量的问题，但不是一个流行的问题，即 xyz.com

形式的问题

我不想在上述正则表达式的末尾添加 .com，因为它限制了该正则表达式的范围。但是我试着写了一些更多的正则表达式，比如:

gsub("[[:alnum:]~!#$%&+-=?,:/;._]*.com",'',testset[10])

这行得通，但它还将 abc@xyz.com 格式的电子邮件 ID 修改为 abc@。我不想要这个，所以我修改为

gsub("*((^@)[[:alnum:]~!#$%&+-=?,:/;._]*).com",'\\1',testset[10])

这保留了电子邮件 ID，但停止识别 xyz.com

形式的网站

我知道我在这里需要某种集合差异，其形式与解释的内容相同 here但我无法实现它(主要是因为我无法完全理解它)。关于如何解决我的问题有什么想法吗？

编辑:我试过否定前瞻:

gsub("[[:alnum:]~!#$%&+-=?,:/;._](?!@)[^(?!.*@)]*.com",'',testset[10])

我收到“无效的正则表达式”错误。我相信在纠正方面的一点帮助可能会使它起作用......

最佳答案

我不敢相信。实际上有一个简单的解决方案。

gsub(" ([[:alnum:]~!#$%&+-=?,:/;._]+)((.com)|(.net)|(.org)|(.info))",' ',text)

这项工作由:

以空格开头。
放入各种东西，除了“@”。
以 .com/net/org/info/结尾

请务必考虑破解它!我相信在某些情况下也会打破这一点。

关于正则表达式帮助 : Identifying websites in text，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/14435212/

25

4

0

文章推荐： python - 对 Python 实现无根记录器策略

文章推荐： jquery - 如果单击链接则显示表单输入框

文章推荐： python - 评估在命令行上传递的公式

文章推荐： regex - `regex{n,}?` == `regex{n}` ?

swift - any Identifiable 不能符合 'Identifiable'
更新:添加关于 Hashable 的相同错误我已经创建了一个 Identifiable 兼容协议(protocol)和兼容结构。然后，当我创建列表并在 ForEach 中引用它时，我收到错误 Typ
c++ - 为什么 "identifier"被认为与 "identifier()"有歧义，当它们被不同地使用时？
这只是我偶然发现的例子! 我正在使用 cout与 operator // imports the declaration of std::cout using namespace std; // ma
mysql - 数据库建模 : does this non-identifiable relationship maintain identifiability?
我有一些表，我使用 MySQL Workbench 创建了 role_has_action 表。创建的字段是:(role_id,action_id,action_controller_id): (为
xcode - 代码签名 : Identifier of Designated Requirements mismatches app identifier
我有一个 codesign 无法完全验证的应用程序，因为它“不满足其指定的要求”。第一次检查返回“在磁盘上有效”，所以没关系。 codesign -dvvvv -r- PATH_TO_APP 告诉我要
java - 天青java : SAS identifier cannot be found for specified signed identifier
我正在使用 Java SDK 创建 SAS 来访问 blob。这是代码: SharedAccessBlobPolicy policy = new SharedAccessBlobPolicy(); p
c++ - undeclared identifier 和 identifier is undefined 是什么意思？如何修复错误？
#include "stdafx.h" #include #include #include #include #include using namespace std; #define T
c++ - 写 "::namespace::identifier"和 "namespace::identifier"有什么区别？
我在代码中看到了这两种方法。你能解释一下这两者有什么区别吗？正如我认为它与 C++ 完成命名空间查找的方式有关，您能否也提供一些相关信息，或者提供一个好的文档的链接？谢谢。最佳答案示例: #inc
ios - 错误的 "Use of undeclared identifier ' {IDENTIFIER }'"
我一直在使用一个工具 sbconstants从我的 Xcode 项目中的 Storyboard 标识符和重用标识符创建外部常量。我已将包含这些常量的所有声明的 header #imported 到项
ios - Bundle Identifier 和 Product Bundle Identifier 有什么区别？
我想知道 bundle Identifier(在 info.plist 中)之间的区别。以及产品 Bundle Identifier(在 Build Setting -> Packaging -> P
java - 如何将 List> 传递给带有 List> 参数的方法？
我有课Identifier它本质上是 UUID 的类型安全包装器(因此类 Foo 包含 Identifier )。 FooStore类有一个方法 List> bulkReadIdentifiers()
go - 当编写一个包用作命令时，这是惯用的 : name all identifiers as private or name all identifiers as public?
在 Go 中，公共(public)名称以大写字母开头，私有(private)名称以小写字母开头。我正在编写一个不是库的程序，它是一个单独的包。是否有任何 Go 习语规定我的标识符应该全部公开还是全部
jakarta-ee - javax.el.ELException : The identifier [return] is not a valid Java identifier
我有一个页面 url，它看起来像: http://mydomain.com/nodes/32/article/new?return=view 安装 tomcat 7 后，尝试访问它时出现此异常: /n
initWithProximityUUID 中的 iOS ibeacon 标识符 :(NSUUID *)proximityUUID identifier:(NSString *)identifier?
我正在学习以下教程: http://www.appcoda.com/ios7-programming-ibeacons-tutorial/ 但是，我没有使用 iPhone 作为信标，而是使用制造商(R
iphone - "Bundle Identifier differs from prior bundle identifier"上传新版iPhone App的.app时出错
我在为我的 iPhone 应用程序的下一版本上传 .app 文件时收到此错误“Bundle Identifier differents from prior bundle identifier”。注
actionscript-3 - 语法错误: expecting identifier before this. expecting colon before leftparen. expecting identifier before rightbrace
Scene 1, Layer 'script', Frame 1, Line 9 1084: Syntax error: expecting identifier before this. Sc
ios - 在 Xcode 7 中使用 Bundle Identifier 而不是 Product Bundle Identifier
升级到 Xcode 7 后，我注意到 CFBundleIdentifier 已开始指向在 Build Settings/Packaging 中找到的产品捆绑标识符，而不是 Info.Plist 中的捆
c++ - 我在 Visual Studio C++ 中遇到这些错误 : 'NuovoUtente' : undeclared identifier and 'CercareUtente' : undeclared identifier
关闭。这个问题需要debugging details .它目前不接受答案。想改进这个问题？将问题更新为 on-topic对于堆栈溢出。 5年前关闭。 Improve this question 我在
java.sql.SQLException :invalid cursor state: identified cursor is not open identified cursor is not open
我使用 Apache DBCP 来获取连接池，我每次都使用 PoolingDataSource 来获取连接。当我向数据库中插入一个对象时，它工作得很好，但是当我尝试从数据库中选择一个元素时，就会出现问
ios - Today-widget 扩展错误 : Embedded binary's bundle identifier is not prefixed with the parent app's bundle identifier
由于我项目的 react-native 版本 (0.44.3)，我正在尝试在版本 0.6.4 中安装包 react-native-today-widget，我能够成功安装包: yarn add rea
ios - 错误代码 : Could not load the "image.png" image referenced from a nib in the bundle with identifier "com.bundle.identifier"
之前有人问过这个问题，我已经查看了所有其他 stackoverflow 主题的答案，但我无法解决这个问题。我的应用程序在所有平台的模拟器中运行良好，但是当我在我的设备上运行该应用程序时，我收到错误代

首页

博学

6Ren·AI

商城

正则表达式帮助 : Identifying websites in text