I have a table that is returning multiple rows for the same ID because of the differences in the name column. Some names include middle initials after the first name or suffix after the last name. In the below example, I want only the first row because it has the complete name.
我有一个表,由于名称列中的差异,它为相同的ID返回多个行。有些名字在名字后面加上中间字母,或者在姓氏后面加上后缀。在下面的示例中,我只想要第一行,因为它有完整的名称。
Column A |
Column B |
2345678999 |
Johnson JR, Carl A |
2345678999 |
Johnson, Carl |
I tried to rank by length of name but it returned
Error for undefined function LEN
我尝试按名称长度进行排名,但因未定义函数LEN而返回错误
with RANKEDROWS AS (
select
ID
,NAME
,rank() over (order by LEN(name) DESC) AS ROWRANK
FROM MEMBERS
)
select ID, NAME
froM RANKEDROWS
WHERE ROWRANK = 1;
I tried to separate the names but it returned two rows - one with what is before the comma and one with what is after it instead of creating additional columns with the name parts.
我试图分隔名称,但它返回两行--一行包含逗号之前的内容,另一行包含逗号后面的内容,而不是使用名称Parts创建额外的列。
select id, explode(split(NAME, ',')) as m_name
from members
更多回答
What version of SparkSQL do you use? Try to use LENGTH
rather than LEN
您使用的是什么版本的SparkSQL?尽量使用长度而不是长度
What is the expected output?
预期的输出是什么?
优秀答案推荐
You need to use LENGTH
instead of LEN
您需要使用长度而不是长度
更多回答
我是一名优秀的程序员,十分优秀!