- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我正在使用 Google BigQuery 抓取 reddit 评论数据库。我将从我正在处理的查询开始:
SELECT
DATE(SEC_TO_TIMESTAMP(created_utc)) AS date,
subreddit,
author AS comment_author,
ups AS upvotes,
LOWER(body)
FROM
[fh-bigquery:reddit_comments.2015_01]
WHERE
body CONTAINS 'acid'
OR body CONTAINS 'ecstasy'
OR body CONTAINS 'fire'
OR body CONTAINS 'heroin'
LIMIT 10;
我需要从 reddit 数据库中抓取大约 30 个与药物相关的词的列表(为简洁起见,我将其限制为 3 个)。
我在两件事上遇到了麻烦:
我也尝试过使用正则表达式来匹配单词,但这似乎也无济于事:
WHERE (REGEXP_MATCH(body,'drug|acid|ecstacy|fire|heroin|joint|marijuana|weed|bud|ganja|hash|blazing|blaze|meth|molly|pcp|shrooms|speed|uppers|valium|xanax|tripping|smoke|liquor|beer|alcohol|booze|acid|benzos|blow|cocaine|crack|crank|dank|dope|downers'))
我们将不胜感激任何帮助。谢谢大家!
最佳答案
下面针对问题的两点
1. 只输出匹配的单词,而不输出属于另一个/不同单词的单词。使用 REGEXP_MATCH 很容易做到这一点函数
2. 拥有包含所有匹配词的列。 (我认为有所有匹配的词比有问题的只有一个更有意义。
SELECT
[date],
subreddit,
comment_author,
upvotes,
GROUP_CONCAT(word) AS matches,
body
FROM (
SELECT
[date],
subreddit,
comment_author,
upvotes,
body,
word
FROM (
SELECT
DATE(SEC_TO_TIMESTAMP(created_utc)) AS [date],
subreddit,
author AS comment_author,
ups AS upvotes,
LOWER(body) AS body
FROM
[fh-bigquery:reddit_comments.2015_01]
WHERE REGEXP_MATCH(body, r'\b(drug|ecstacy|fire|heroin|joint|marijuana|weed|bud|ganja|hash|blazing|blaze|meth|molly|pcp|shrooms|speed|uppers|valium|xanax|tripping|smoke|liquor|beer|alcohol|booze|acid|benzos|blow|cocaine|crack|crank|dank|dope|downers)\b')
) x
CROSS JOIN (
SELECT SPLIT(list,'|') AS word FROM
(SELECT 'drug|ecstacy|fire|heroin|joint|marijuana|weed|bud|ganja|hash|blazing|blaze|meth|molly|pcp|shrooms|speed|uppers|valium|xanax|tripping|smoke|liquor|beer|alcohol|booze|acid|benzos|blow|cocaine|crack|crank|dank|dope|downers' AS list)
) y
HAVING body CONTAINS word
)
GROUP BY [date], subreddit, comment_author, upvotes, body
LIMIT 1000
以上解决方案提供了尽力而为的匹配词列表,因此请注意:
如果 matches
列包含一个词 - 它肯定是完全匹配的词
但是,如果此列由几个词组成 - 仍然有一个是完全匹配的,但其他列可能不是完全匹配的。
我认为对于冗长的 body - 至少将它们作为寻找内容的提示仍然很有值(value)。例如在
drug,meth,heroin,alcohol,benzos it also inhibits the reuptake of serotonin and norepinephrine which gives a hell of a lot worse withdrawal symptoms than most other drugs(incl. heroin, meth, coke and etc.). from what i have heard the only things that rival tramadol it terms of withdrawal are benzos and alcohol.
liquor,beer,alcohol,booze 1. reinforce #3 - it is not cheap to live here. not by any stretch. expect to pay more than the rest of the country pays for everything. even franchises that operate nation-wide have special wa/perth pricing. 2. petrol has literally just dropped to $1 this past month, i wouldn't go as far as quoting that as our average price just yet. average is still between $1.20-1.30. 3. parking is free at beaches & parks, do not expect to get free parking anywhere in the city though. if you're using public parking in the city all day, expect to pay $50 unless you get in early. 4. forget bribing the cops, don't even call them "mate". last time i was pulled over (last week, random stop) i said "evening mate" as i was handing him my license and was responded with "don't call me mate, i'm not your friend, i don't know you". 5. unlike the rest of the world, regular stores do not sell alcohol here. liquor stores only, don't expect to buy beer from a gas station or grocery store. 6. rent is expensive, food is expensive, booze is expensive, being alive is expensive.
drug,meth,heroin,beer that's simply not true. first there's a difference between legalization and decriminalization. second, some european countries have places to go to safely use drugs. there is middle ground between allowing heroin to be sold all over town and having users go to prison. heroin, meth and some other drugs are not good things for society and their use should encouraged by making it as easy to buy as a 6 pack of beer. i'm not really sure why you can't see a middle ground because it's clearly not as black and white as you say. you can go after the dealers while leaving the users alone.
drug,fire,joint,smoke not a story about a rave, but still relevant i think: i was working a job called "fire watch," which is just what it sounds like, at a nine inch nails concert a few years ago. our comrades, the security workers, were far from seasoned professionals. they were mostly college temps with a yellow security tee shirt and a flashlight; they didn't even have radios. the job is basically to make sure people don't go into restricted areas. ...but this one boy scout took it upon himself to tame the metal masses. mid-concert, he pulled me close and shouted "they're smoking pot!" i shrugged, and shot him an "and?" look. i guess he thought i should care because technically a joint is a tiny dangerous drug fire, and i was on the fire crew. he then proceeded to disappear into the crowd, shoving people out of the way on his heroic journey toward the countless smoke puff origins. the next time i saw him he was bleeding out of his face and getting a flashlight in the eyes from an onsite emt. i guess it's pretty harsh to say that he deserved the beating, but it's hard to argue that he didn't go asking for it. i guess the moral of my story is that security people are just people, and some people's shittyness is inflamed when combined with authority. it sounds like your event just happened to be warded by a gaggle of douches, probably being captained by king fuckwad who really wanted to be a cop, but couldn't pass the exams.
注意:如果您只需要完全匹配的列表,使用BigQuery User-Defined Functions 仍然相对容易。
关于sql - 需要为文本正文中的特定单词格式化 BigQuery 中的表格,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34933379/
我在休息服务中有以下方法: @POST @Path("/create") @ResponseStatus(HttpStatus.CREATED) @Consumes(M
这个问题不太可能对 future 的访客有帮助;它只与一个小的地理区域、一个特定的时刻或一个非常狭窄的情况相关,通常不适用于互联网的全局受众。如需帮助使这个问题更广泛地适用,visit the hel
我有这样的弹出框: Speelland And here's some amazing content. It's very engaging. Right? Meer
我正在开发一个 firefox 插件,我正在收听这样的 http 响应: var observerService = Components.classes["@mozilla.org/observer
我正在使用 jqtouch 制作一个移动网站。我还在网站中实现了图库图像 slider ,但是当图库放在我需要的位置时(在 之间,图像不会显示。 修补了几个小时后,删除了 display: none
为了在 iPad 上的 Safari 上显示视差效果,我采用了以下 CSS 规则: body:after { content: ""; position: fixed; top
我想在通过 excel VBA 创建的电子邮件正文中插入一个链接。链接每天都在变化,所以我把它的值放在单元格 B4 中。但是,我找不到正确的方法来发送带有该链接的电子邮件。 这是我正在使用的代码: P
我正在尝试使用具有非常大主体的 Postman 执行 POST 请求。只有一个 JSON 字段非常大,我想知道是否可以从 Postman 的文件中加载该字段? { "field1": {
这个问题是针对 SoapUI 5.2.1 社区版的: 我有一个包含变量的 JSON 主体的 POST 请求。 我总是能够通过单击“原始”选项卡以查看请求进行或将发送到服务器来验证这些参数是否采用正确的
我有这个按钮,单击该按钮会打开 Outlook,其中包含我提供的详细信息。我还有一个 TEXTAREA,其中包含某些文本。我正在寻找一种方法让此文本出现在我的 Outlook 正文中。这可以做到吗?请
我知道错误消息是不言自明的,我们无法多次读取消息正文。这里我使用AOP(面向方面编程)来进行审计日志。 [AuditServiceMethod(AttributePriority = 0)] [F
我在 grails 3.3.3 中编写自定义验证器(命令)时遇到了一些问题。具体来说,我正在尝试验证其正文由项目列表组成的 POST 请求。这就是我所拥有的... 命令: class VoteComm
这个问题在这里已经有了答案: json.Marshal(struct) returns "{}" (3 个回答) JSON and dealing with unexported fields (3
我想清理很多邮件的 HTML 正文,它们有点脏(取自 Gmail 发送的电子邮件):有很多嵌套 ,不需要的字体更改等我想清理它并只保留 , , , , , 仅此而已(可能还有 或一些 ,
我正在使用 Accordion 功能在我的模块中添加端口详细信息。我只想在水平方向上显示正文内容。请看下面的 fiddle 。 html, body { background-color:#e
我的 HTML 正文中有这个: loaded yÉt. 使用 JavaScript 我有这个: $( document ).ready(function() { document.bod
我对图表有很大的疑问。我试图在谷歌图表中显示一些 json 值,但我总是会出错。从 JSON 正文中,我只需要图表上个月的“全部购买”和“日期”。我见过的所有例子,他们已经有了一个静态的自定义 Jso
我的应用程序的功能之一涉及用户填写三个单独的文本字段(预订名称、客人和日期),然后使用文本编辑器通过短信发送这些字段中的文本。我无法将这些 View 中的文本放入正文中。这是我的代码: - (IBAc
我正在开发一个 HTA,它应该对 onunload 事件进行一些最终修改。该事件似乎没有被触发。 该事件是否仍受支持?是否有 IE 事件可以知道页面何时关闭? 我检查了一下(JavaScript bo
我正在尝试将以下图像添加为网站内容的背景: http://webbos.co/vibration/wp-content/themes/vibration-child-theme/images/back
我是一名优秀的程序员,十分优秀!