I have a dataset and in one of the columns I have values like '1-M3 [J]'
and '1 - M3 [J]'
. Both are same values but a space is added.Data is inconsistent in this case .
我有一个数据集,在其中一列中,我有像“1-M3[J]”和“1-M3[J].”这样的值。两者的值相同,但添加了一个空格。在这种情况下,数据不一致。
I used:
我曾经:
Split(column,'[-]')[0]
Which will give value 1 only one time .
这将只给出一次值1。
Split(column,'[ - ]')[0]
which will give only 1 value .
Split(列,“[-]”)[0],只提供1个值。
I am expecting both values to be retrieved, I request your help on this, use trim on Join condition?
我希望两个值都能被检索到,我请求你的帮助,在Join条件下使用trim?
Expected output is : 1,1
预期输出为:1,1
更多回答
Hi - it is very unclear from your question what you are trying to achieve. Please update your question with example(s) of what you issue is and what you want the result to be. Providing details of a solution (using split) that doesn't work doesn't really help anyone. If you are just trying to make "1 - M3 [J]" = "1-M3 [J]" then just use REPLACE(column1, " - ", "-")
嗨,从你的问题中还不清楚你想要实现什么。请用你提出的问题和你希望结果的例子来更新你的问题。提供不起作用的解决方案(使用拆分)的细节对任何人都没有帮助。如果您只是想使“1-M3[J]”=“1-M3[J].”,那么只需使用REPLACE(column1,“-”,“-“)
Clean your data with an UPDATE statement and then run the query. Don't clean data during a join, it's massively inefficient, and prevents the ability to use indexes.
使用UPDATE语句清理数据,然后运行查询。在联接过程中不要清理数据,这会非常低效,并妨碍使用索引。
优秀答案推荐
Use REPLACE
to replace extra spaces with blank '' and then use it in join.
使用REPLACE将多余的空格替换为空白“”,然后在联接中使用它。
REPLACE(Column1, " ", "")
You can use the following function block :
您可以使用以下功能块:
import re
data = ['1-M3[J]', '1-M3[J]']
number = []
for item in data:
match = re.findall(r'\d+', item)
if match:
number.extend(match)
number = [int(num) for num in number]
print(number)
更多回答
No I don't need to replace .I need the value 1 only in my Output....and I will use it in join..
不,我不需要替换。我只需要输出中的值1。。。。我将在join中使用它。。
can you not substr(col,1,1)
?
你能不减去(col,1,1)吗?
In hive, not python
在蜂箱里,不是蟒蛇
oh okay, sorry not so good in hive
哦,好吧,抱歉在蜂箱里不太好
我是一名优秀的程序员,十分优秀!