Python: Find the longest word in a string(Python：查找字符串中最长的单词)

翻译作者：bug小助手更新时间：2023-10-26 22:24:47

I'm preparing for an exam but I'm having difficulties with one past-paper question. Given a string containing a sentence, I want to find the longest word in that sentence and return that word and its length. Edit: I only needed to return the length but I appreciate your answers for the original question! It helps me learn more. Thank you.

我正在准备一场考试，但有一道过去的试卷我遇到了困难。给定一个包含一个句子的字符串，我希望找到该句子中最长的单词，并返回该单词及其长度。编辑：我只需要返回长度，但我感谢您对原始问题的回答！它帮助我学到了更多。谢谢。

For example: string = "Hello I like cookies". My program should then return "Cookies" and the length 7.

Now the thing is that I am not allowed to use any function from the class String for a full score, and for a full score I can only go through the string once. I am not allowed to use string.split() (otherwise there wouldn't be any problem) and the solution shouldn't have too many for and while statements. The strings contains only letters and blanks and words are separated by one single blank.

现在的问题是我不允许使用String类中的任何函数来获得满分，而对于满分，我只能遍历字符串一次。我不允许使用string.split（）（否则不会有任何问题），解决方案不应该有太多的for和while语句。字符串只包含字母和空格，单词由一个空格分隔。

Any suggestions? I'm lost i.e. I don't have any code.

有什么建议吗？我迷路了，也就是说我没有任何密码。

Thanks.

谢谢。

EDIT: I'm sorry, I misread the exam question. You only have to return the length of the longest word it seems, not the length + the word.

编辑：对不起，我读错了试题。你只需要返回看起来最长的单词的长度，而不是长度+单词。

EDIT2: Okay, with your help I think I'm onto something...

EDIT2：好的，有你的帮助，我想我正在做一些事情……

def longestword(x):
      alist = []
      length = 0
      for letter in x:
             if letter != " ":
                     length += 1
             else:
                     alist.append(length)
                     length = 0
      return alist

But it returns [5, 1, 4] for "Hello I like cookies" so it misses "cookies". Why? EDIT: Ok, I got it. It's because there's no more " " after the last letter in the sentence and therefore it doesn't append the length. I fixed it so now it returns [5, 1, 4, 7] and then I just take the maximum value.

但是对于“Hello I Like Cookies”，它返回[5，1，4]，因此它错过了“Cookie”。为什么？编辑：好的，我知道了。这是因为在句子的最后一个字母后面没有更多的“”，因此它不会附加长度。我修正了它，现在它返回[5，1，4，7]，然后我取最大值。

I suppose using lists but not .split() is okay? It just said that functions from "String" weren't allowed or are lists part of strings?

我想使用列表而不是.Split()可以吗？它只是说“字符串”中的函数是不允许的，或者列表是字符串的一部分？

更多回答

Technically, you can import the string module and call split from there because you wouldn't be using it from the string class. Maybe this is a test of knowing the documentation.

从技术上讲，您可以导入字符串模块并从那里调用Split，因为您不会在字符串类中使用它。也许这是对文档了解程度的一次测试。

If you test only that the letter is not " ", it will fail with punctuation: Hello, I like cookies!". You'd better test that the character is not a letter.

如果您只测试字母不是“”，它将失败并显示标点符号：你好，我喜欢饼干！“。您最好测试字符不是字母。

@FrancisColas my bad again. It says in the exam question that "The text contains only letters and blanks, and each word is separated by one single blank". Then it's fine?

@FrancisColas又是我的错。试题中写道：“课文只包含字母和空格，每个单词由一个空格分隔”。那就没问题了？

Yes, in that case it works fine: your test is also faster than mine and simpler than using regular expressions.

是的，在这种情况下，它工作得很好：您的测试也比我的更快，而且比使用正则表达式更简单。

优秀答案推荐

You can try to use regular expressions:

您可以尝试使用正则表达式：

import re

string = "Hello I like cookies"
word_pattern = "\w+"

regex = re.compile(word_pattern)
words_found = regex.findall(string)

if words_found:
    longest_word = max(words_found, key=lambda word: len(word))
    print(longest_word)

Finding a max in one pass is easy:

在一次传递中找到最大值很容易：

current_max = 0
for v in values:
    if v>current_max:
        current_max = v

But in your case, you need to find the words. Remember this quote (attribute to J. Zawinski):

但在你的情况下，你需要找到合适的词语。记住这句话(归功于J.Zawinski)：

Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.

Besides using regular expressions, you can simply check that the word has letters. A first approach is to go through the list and detect start or end of words:

除了使用正则表达式之外，您还可以简单地检查单词是否有字母。第一种方法是浏览列表并检测单词的开始或结束：

current_word = ''
current_longest = ''
for c in mystring:
    if c in string.ascii_letters:
        current_word += c
    else:
        if len(current_word)>len(current_longest):
            current_longest = current_word
            current_word = ''
else:
    if len(current_word)>len(current_longest):
        current_longest = current_word

A final way is to split words in a generator and find the max of what it yields (here I used the max function):

最后一种方法是在生成器中拆分单词，并找到它产生的最大值（这里我使用了max函数）：

def split_words(mystring):
    current = []
    for c in mystring:
        if c in string.ascii_letters:
            current.append(c)
        else:
            if current:
                yield ''.join(current)
max(split_words(mystring), key=len)

Just search for groups of non-whitespace characters, then find the maximum by length:

只需搜索非空格字符组，然后按长度找到最大值：

longest = len(max(re.findall(r'\S+',string), key = len))

For python 3. If both the words in the sentence is of the same length, then it will return the word that appears first.

如果句子中的两个单词具有相同的长度，则它将返回最先出现的单词。

def findMaximum(word):
    li=word.split()
    li=list(li)
    op=[]
    for i in li:
        op.append(len(i))
    l=op.index(max(op))
    print (li[l])
findMaximum(input("Enter your word:"))

It's quite simple:

这很简单：

def long_word(s):
    n = max(s.split())
    return(n)

IN [48]: long_word('a bb ccc dddd')

在[48]中：Long_Word(‘a BB CCC dddd’)

Out[48]: 'dddd'

Out[48]：‘dddd’

found an error in a previous provided solution, he's the correction:

在以前提供的解决方案中发现错误，他是更正：

def longestWord(text):
    
    current_word = ''
    current_longest = ''
    for c in text:
        if c in string.ascii_letters:
            current_word += c
        else:
            if len(current_word)>len(current_longest):
                current_longest = current_word
            current_word = ''    

    if len(current_word)>len(current_longest):
        current_longest = current_word
    return   current_longest

I can see imagine some different alternatives. Regular expressions can probably do much of the splitting words you need to do. This could be a simple option if you understand regexes.

我可以想象一些不同的选择。正则表达式可能可以完成您需要做的大部分拆分单词。如果您了解正则表达式，这可能是一个简单的选择。

An alternative is to treat the string as a list, iterate over it keeping track of your index, and looking at each character to see if you're ending a word. Then you just need to keep the longest word (longest index difference) and you should find your answer.

另一种方法是将字符串视为一个列表，遍历它，跟踪您的索引，并查看每个字符，以确定您是否正在结束一个单词。然后你只需要保留最长的单词(最长的索引差异)，你就应该找到答案了。

Regular Expressions seems to be your best bet. First use re to split the sentence:

正则表达式似乎是您最好的选择。首先用re来拆分句子：

>>> import re
>>> string = "Hello I like cookies"
>>> string = re.findall(r'\S+',string)

\S+ looks for all the non-whitespace characters and puts them in a list:

\S+查找所有非空格字符并将其放入列表中：

>>> string
['Hello', 'I', 'like', 'cookies']

Now you can find the length of the list element containing the longest word and then use list comprehension to retrieve the element itself:

现在，您可以找到包含最长单词的列表元素的长度，然后使用列表理解来检索元素本身：

>>> maxlen = max(len(word) for word in string)
>>> maxlen
7
>>> [word for word in string if len(word) == maxlen]
['cookies']

This method uses only one for loop, doesn't use any methods in the String class, strictly accesses each character only once. You may have to modify it depending on what characters count as part of a word.

该方法只使用一个for循环，不使用字符串类中的任何方法，严格地只访问每个字符一次。您可能需要修改它，具体取决于哪些字符被视为单词的一部分。

s = "Hello I like cookies"
word = ''
maxLen = 0
maxWord = ''
for c in s+' ':
    if c == ' ':
        if len(word) > maxLen:
            maxWord = word
        word = ''
    else:
        word += c


print "Longest word:", maxWord
print "Length:", len(maxWord)

Given you are not allowed to use string.split() I guess using a regexp to do the exact same thing should be ruled out as well.

考虑到不允许使用string.plit()，我想也应该排除使用regexp来做完全相同的事情。

I do not want to solve your exercise for you, but here are a few pointers:

我不想为你解决你的练习问题，但这里有几个建议：

Suppose you have a list of numbers and you want to return the highest value. How would you do that? What information do you need to track?

Now, given your string, how would you build a list of all word lengths? What do you need to keep track of?

Now, you only have to intertwine both logics so computed word lengths are compared as you go through the string.

My proposal ...

import re
def longer_word(sentence):
    word_list = re.findall("\w+", sentence)
    word_list.sort(cmp=lambda a,b: cmp(len(b),len(a)))
    longer_word = word_list[0]
    print "The longer word is '"+longer_word+"' with a size of", len(longer_word), "characters."
longer_word("Hello I like cookies")

import re

def longest_word(sen):
  res = re.findall(r"\w+",sen)
  n = max(res,key = lambda x : len(x))
  return n

print(longest_word("Hey!! there, How is it going????"))

Output : there

输出：在那里

Here I have used regex for the problem. Variable "res" finds all the words in the string and itself stores them in the list after splitting them.
It uses split() to store all the characters in a list and then regex does the work.

在这里，我使用了正则表达式来解决这个问题。变量“res”查找字符串中的所有单词，并在拆分后将它们存储在列表中。它使用Split()将所有字符存储在一个列表中，然后使用regex来完成工作。

findall keyword is used to find all the desired instances in a string. Here \w+ is defined which tells the compiler to look for all the words without any spaces.

Findall关键字用于在字符串中查找所有需要的实例。这里定义了\w+，它告诉编译器查找所有不带空格的单词。

Variable "n" finds the longest word from the given string which is now free of any undesired characters.

变量“n”从给定的字符串中查找现在没有任何不需要的字符的最长单词。

Variable "n" uses lambda expressions to define the key len() here.

变量“n”在这里使用lambda表达式来定义键len()。

Variable "n" finds the longest word from "res" which has removed all the non-string charcters like %,&,! etc.

变量“n”从“res”中查找最长的单词，该单词删除了%、&、！等所有非字符串字符等。

>>>#import regular expressions for the problem.**
>>>import re

>>>#initialize a sentence
>>>sen = "fun&!! time zone"

>>>res = re.findall(r"\w+",sen)
>>>#res variable finds all the words and then stores them in a list.

>>>res
Out: ['fun','time','zone']

>>>n = max(res)
Out: zone

>>>#Here we get "zone" instead of "time" because here the compiler
>>>#sees "zone" with the higher value than "time".
>>>#The max() function returns the item with the highest value, or the item with the highest value in an iterable.

>>>n = max(res,key = lambda x:len(x))
>>>n
Out: time

Here we get "time" because lambda expression discards "zone" as it sees the key is for len() in a max() function.

这里我们得到“time”，因为lambda表达式丢弃了“zone”，因为它看到了max()函数中的len()键。

in case you want to ignore punctuation as well here is a more Pythonic way: this will return the longest string you can use len to get length and return it

如果你也想忽略标点符号，这里有一个更Python的方法：这将返回最长的字符串，你可以使用len来获取长度并返回它

from string import punctuation
def longest_word(str):
   return max("".join([s for s in str if s not in punctuation]).split(" "), key=len)

in case you have two strings with the same length the first one that appeared will be returned

如果您有两个长度相同的字符串，则将返回出现的第一个字符串

simple Inputs,Outputs:

简单的输入、输出：

print(longest_word("test1 test2")) # -> test1
print(longest_word("t.est1: test2")) # -> test1
print(longest_word("hy: hello")) # -> hello

list1 = ['Happy', 'Independence', 'Day', 'Zeal']
listLen = []
for i in list1:
          listLen.append(len(i))
print list1[listLen.index(max(listLen))]

Output - Independence

输出--独立

更多回答

Thanks, I'll look at this some more. We haven't talked about regular expressions I think.

谢谢，我再看看这个。我想我们还没有讨论过正则表达式。

That first method fails to detect "cookies" because it's the last word in the string.

第一个方法无法检测到“cookies”，因为它是字符串中的最后一个单词。

Indeed (I had read "I like cookies." with punctuation). I edited with a fix.

事实上，我读过《我喜欢饼干》。带有标点符号)。我是用补丁编辑的。

I like the generator/max approach, it is elegant and modular. Not sure this is the intended approach for the exercise, but it would deserve full score imo. You would have to tweak it a bit so the word is returned as well, though.

我喜欢生成器/最大值方法，它是优雅和模块化的。不确定这是演习的目的方法，但它应该得到国际海事组织的满分。不过，您必须对它进行一些调整，这样才能同时返回单词。

Thank you, very thorough answer.

谢谢，回答得非常透彻。

The OP stated that using split is not allowed.

操作员声明，不允许使用Split。

Okay, I'll just use regex then.

好的，那我就用正则表达式吧。

This returns the longest words length, not the actual word.

这将返回最长的单词长度，而不是实际的单词。

@ChaseRoberts Ok no problem, remove the call to len

@ChaseRoberts OK没问题，删除对len的调用

This is not the questions' answer, it returns the highest word by alphabetical order.

这不是问题的答案，它按字母顺序返回最高的单词。

Thanks, your second paragraph inspired me and I think I solved it.

谢谢，你的第二段启发了我，我想我已经解决了。

Mmt, Regarding your code, you cannot find cookies, because there is no space at the end of your string .. and the the last word is never stored in your list.

MMT，关于您的代码，您找不到Cookie，因为字符串末尾没有空格。最后一个词永远不会存储在你的列表中。

c# - String.Concat(String, String, String, String) 有什么意义
如果您想使用 String.Concat() 连接 5 个或更多字符串，则它会使用 Concat(String[])。为什么不一直使用 Concat(String[]) 而不再需要 Concat(S
java - String + String 与从方法返回的 String + String
今天在使用 String 时，我遇到了一种我以前不知道的行为。我无法理解内部发生的事情。 public String returnVal(){ return "5";
string - 使用Hibernate映射Map
似乎在我所看到的任何地方，都有一些过时的版本，这些版本不再起作用。我的问题似乎很简单。我有一个Java类，它映射到derby数据库。我正在使用注释，并且已经成功地在数据库中创建了所有其他表，但是在这
string::size_type、string::npos、 string::substr、string::find_first_of、string::replace、string::assign
一、string::size_type() 在C++标准库类型 string ，在调用size函数求解string 对象时，返回值为size_type类型，一种类似于unsigned类型的int 数据
swift - 无法将 [String : String? ] 类型的值转换为预期的参数类型 [String : String]
我正在尝试将数据保存到我的 plist 文件中，其中包含字符串数组的定义。我的plist - enter image description here 我将数据写入 plist 的代码是 -- let
javascript 将转换为
我有一个带有键/值对的 JavaScript 对象，其中值是字符串数组: var errors = { "Message": ["Error #1", "Error #2"], "Em
java - 如何使用相同的递归函数迭代 Map 和 Map> ？
例如，为了使用相同的函数迭代 List 和 List> ，我可以编写如下内容: import java.util.*; public class Test{ public static voi
C#:Dictionary 到 Dictionary> 的转换
第一个Dictionary就像 Dictionary ParentDict = new Dictionary(); ParentDict.Add("A_1", "1")
java - Functions 类型中的方法 replace(String, String, String) 不适用于参数 (StringBuffer, String, String)
这是我的 jsp 文件: 我遇到了错误 The method replace(String, String, String) in the type Functions is not appl
c# - string.Join(string, string[]) 返回 "System.String[]"
我需要一些帮助。我有一个方法应该输出一个包含列表内容的 txt 文件(每行中的每个项目)。列表项是字符串数组。问题是，当我调用 string.Join 时，它返回文字字符串 "System.Strin
c# - 使用 String+string+string 与使用 string.replace
一位同事告诉我，使用以下方法: string url = "SomeURL"; string ext = "SomeExt"; string sub = "SomeSub"; string s
Java 将 {String,String}[] 转换为 Map
给定类: public class CategoryValuePair { String category; String value; } 还有一个方法: public
java - 如何将 Stream>> 合并为一个 Map>？
我正在尝试合并 Stream>>对象与所有 Streams 中的键一起映射到单个映射中. 例如， final Map someObject; final List>> list = someObjec
c# - IDictionary 与 Dictionary
在这里使用 IDictionary 的值(value)是什么？最佳答案使用接口(interface)的值(value)始终相同:切换到另一个后端实现时，您不必更改客户端代码。请考虑稍后分析您的代
ios - [Dictionary()] 和 [String : String]() 之间有什么区别
我可以知道这两个字典声明之间的区别吗？ var places = [String: String]() var places = [Dictionary()] 为什么当我尝试以这种方式附加声明时，只有
c# - string.IsNullOrEmpty(string) 与 string.IsNullOrWhiteSpace(string)
在 .NET 4.0 及更高版本中存在 string.IsNullOrWhiteSpace(string) 时，在检查字符串时使用 string.IsNullOrEmpty(string) 是否被视为
string - 为什么 "Here Strings"被称为 "Here Strings"？
这个名字背后的原因是什么？ SS64在 PowerShell 中解释此处的字符串如下: A here string is a single-quoted or double-quoted string
string - From<&String> 特性没有为 String 类型实现
我打算离开 this 文章，尝试编写一个接受字符串和 &str 的函数，但我遇到了问题。我有以下功能: pub fn new(t_num: S) -> BigNum where S: Into {
ios - 获取多维数组的键( `[String: [String: String]]`)
我有一个结构为 [String: [String: String]] 的多维数组。我可以使用 for 循环到达 [String: String] 位，但我不知道如何访问主键(这个位 [String:
string - 如何使用 map[string]*string
我正在尝试使用 sarama(管理员模式)创建主题。没有 ConfigEntries 工作正常。但我需要定义一些配置。我设置了主题配置(这里发生了错误): tConfigs := map[s

bug小助手

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

Python: Find the longest word in a string(Python：查找字符串中最长的单词)

My proposal ...