- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我不知道我是否以正确的方式问这个问题,但我想搜索日志文件并查找数组中的每个单词。此时,我已要求用户将有问题的文件拖到终端中,然后根据输入构建一个数组。程序应该打印出找到单词的每一行。
一旦我开始工作,我就会格式化,有一个计数器,或者对我在文件中找到的内容进行一些总结,等等。
这是我到目前为止所得到的,只有当我运行它时,它实际上没有找到任何单词。我一直在查看重新使用示例,但我认为对于我的想法来说可能过于复杂:
def wordsToFind():
needsWords = True
searchArray = []
print "Add words to search ('done') to save/continue."
while needsWords == True:
word = raw_input("Enter a search word: ")
if word.lower() == "done":
needsWords = False
break
else:
searchArray.append(word)
print word + " added"
return searchArray
def getFile():
file_to_read = raw_input("Drag file here:").strip()
return file_to_read
def main():
filePath = getFile()
searchArray = wordsToFind()
print "Words searched for: ", searchArray
searchCount = []
with open(filePath, "r") as inFile:
for line in inFile:
for item in searchArray:
if item in line:
print item
main()
显然,这里强烈欢迎任何优化建议或更好的 python 编码建议,我只知道我所知道的,并感谢所有帮助!
最佳答案
这正是 Map-Reduce 想要解决的问题。如果您不熟悉,map-reduce 是一个简单的两步过程。假设您有一个列表,存储您有兴趣在文本中查找的单词。您的映射器函数可以针对文本的每一行迭代此单词列表,如果它出现在该行中,它将返回一个值,例如 ['word', lineNum],该值存储在结果列表中。映射器本质上是 for 循环的包装器。然后,您可以通过编写一个缩减器函数来获取结果列表并“缩减”它,在本例中,可以获取应类似于 [['word1', 1]...['word1', n] 的结果列表...] 到一个看起来像 {'word1': [1, 2, 5], 'word3': [7], ...} 的对象。
这种方法是有利的,因为您在对每个项目执行通用操作时抽象了迭代列表的过程,并且如果您的分析需求发生变化(就像经常发生的那样),您只需要更改映射器/减少函数而无需接触其余的代码。此外,此方法具有高度并行性,如果它成为问题(只需询问 Google!)。
Python 3.x 具有内置的map/reduce 方法,如map() 和reduce();在 python 文档中查找它们。所以你可以看到它们是如何工作的,我根据你的问题实现了一个版本的map/reduce,而不使用内置库。由于您没有指定数据的存储方式,因此我对此做了一些假设,即感兴趣的单词列表将以逗号分隔的文件形式给出。为了读取文本文件,我使用 readlines() 来获取行数组,并使用正则表达式模式将行拆分为单词(即,拆分任何不是字母数字字符的内容)。当然,这可能不适合您的需求,因此您可以将其更改为对您正在查看的文件有意义的任何内容。
我试图远离深奥的 python 功能(没有 lambda!),所以希望实现是清晰的。最后一点,我使用循环来迭代文本文件的行,并使用映射函数来迭代感兴趣的单词列表。您可以使用嵌套映射函数,但我想跟踪循环索引(因为您关心行号)。如果您确实想嵌套映射函数,则可以在读取文件时将数组行存储为行和行号的元组,或者可以修改映射函数以返回索引(您的选择)。
希望这会有所帮助!
#!usr/bin/env/ python
#Regexp library
import re
#Map
#This function returns a new array containing
#the elements after that have been modified by whatever function we passed in.
def mapper(function, sequence):
#List to store the results of the map operation
result = []
#Iterate over each item in sequence, append the values to the results list
#after they have been modified by the "function" supplied as an argument in the
#mapper function call.
for item in sequence:
result.append(function(item))
return result
#Reduce
#The purpose of the reduce function is to go through an array, and combine the items
#according to a specified function - this specified function should combine an element
#with a base value
def reducer(function, sequence, base_value):
#Need to get an base value to serve as the starting point for the construction of
#the result
#I will assume one is given, but in most cases you should include extra validation
#here to either ensure one is given, or some sensible default is chosen
#Initialize our accumulative value object with the base value
accum_value = base_value
#Iterate through the sequence items, applying the "function" provided, and
#storing the results in the accum_value object
for item in sequence:
accum_value = function(item, accum_value)
return accum_value
#With these functions it should be sufficient to address your problem, what remains
#is simply to get the data from the text files, and keep track of the lines in
#which words appear
if __name__ == 'main':
word_list_file = 'FILEPATH GOES HERE'
#Read in a file containing the words that will be searched in the text file
#(assumes words are given as a comma separated list)
infile = open(word_list_file, 'rt') #Open file
content = infile.read() #read the whole file as a single string
word_list = content.split(',') #split the string into an array of words
infile.close()
target_text_file = 'FILEPATH GOES HERE'
#Read in the text to analyze
infile = open(target_text_file, 'rt') #Open file
target_text_lines = infile.readlines() #Read the whole file as an array of lines
infile.close()
#With the data loaded, the overall strategy will be to loop over the text lines, and
#we will use the map function to loop over the the word_list and see if they are in
#the current text file line
#First, define the my_mapper function that will process your data, and will be passed to
#the map function
def my_mapper(item):
#Split the current sentence into words
#Will split on any non alpha-numeric character. This strategy can be revised
#to find matches to a regular expression pattern based on the words in the
#words list. Either way, make sure you choose a sensible strategy to do this.
current_line_words = re.split(r'\W+', target_text_lines[k])
#lowercase the words
current_line_words = [word.lower() for word in current_line_words]
#Check if the current item (word) is in the current_line_words list, and if so,
#return the word and the line number
if item in current_line_words:
return [item, k+1] #Return k+1 because k begins at 0, but I assume line
#counting begins with 1?
else:
return [] #Technically, this does not need to be added, it can simply
#return None by default, but that requires manually handling iterator
#objects so the loop doesn't crash when seeing the None values,
#and I am being lazy :D
#With the mapper function established, we can proceed to loop over the text lines of the
#array, and use our map function to process the lines against the list of words.
#This array will store the results of the map operation
map_output = []
#Loop over text file lines, use mapper to find which words are in which lines, store
#in map_output list. This is the exciting stuff!
for k in range(len(target_text_lines)):
map_output.extend(mapper(my_mapper, word_list))
#At this point, we should have a list of lists containing the words and the lines they
#appeared in, and it should look like, [['word1', 1] ... ['word25': 5] ... [] ...]
#As you can see, the post-map array will have an entry for each word that appeared in
#each line, and if a particular word did not appear in a particular line, there will be a
#empty list instead.
#Now all that remains is to summarize our data, and that is what the reduce function is
#for. We will iterate over the map_output list, and collect the words and which lines
#they appear at in an object that will have the format { 'word': [n1, n2, ...] },where
#n1, n2, ... are the lines the word appears in. As in the case for the mapper
#function, the output of the reduce function can be modified in the my_reducer function
#you supply to it. If you'd rather it return something else (like say, word count), this
#is the function to modify.
def my_reducer(item, accum_value):
#First, verify item is not empty
if item != []:
#If the element already exists in the output object, append the current line
#value to it, if not, add it to the object and create a set holding the current
#line value
#Check this word/line combination isn't already stored in the output dict
if (item[0] in accum_value) and (item[1] not in accum_value[item[0]]):
accum_value[item[0]].append(item[1])
else:
accum_value[item[0]] = [item[1]]
return accum_value
#Now we can call the reduce function, save it's output, print it to screen, and we're
#done!
#(Note that for base value we are just passing in an empty object, {})
reduce_results = reducer(my_reducer, map_output, {})
#Print results to screen
for result in reduce_results:
print('word: {}, lines: {}'.format(result, reduce_results[result]))
关于python - 迭代数组并在文件中搜索数组中的每个项目,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18169794/
如果您有超过 1 个具有相同类名的(动态)文本框,并使用 jquery 循环遍历每个所述文本框,您是否可以假设每次选择文本框的顺序都是相同的? 示例: 文本框 1 值 = 1文本框 2 值 = 2文本
有人知道为什么这段代码无法顺利运行吗?它似乎不喜欢使用yield关键字进行迭代:我正在尝试从任何级别的列表或字典中挖掘所有数字(对列表特别感兴趣)。在第二次迭代中,它找到 [2,3] 但无法依次打印
我关于从 mysql 数据库导出数据并将其保存到 Excel 文件(多表)的创建脚本。我需要让细胞动态基因化。该脚本正确地显示了标题,但数据集为空。当我“回显”$value 变量时,我检查了数据是否存
我正在尝试在 Python 中运行模拟,由此我绘制了一个数组的随机游走图,给定了两个变量参数的设定水平。 但是,我遇到了一个问题,我不确定如何迭代以便生成 250 个不同的随机数以插入公式。例如我已经
我是学习 jquery 的新手,所以如果这是一个相对简单的问题,我深表歉意。我有一个 ID 为 ChartstoDisplay 的 asp.net 复选框列表。我正在尝试创建 jquery 来根据是否
我正在尝试根据在任意数量的部分中所做的选择找出生成有效案例列表的最佳方法。也许它不是真正的算法,而只是关于如何有效迭代的建议,但对我来说这似乎是一个算法问题。如果我错了,请纠正我。实现实际上是在 Ja
如果我使用 sr1 为 www.google.com 发送 DNSQR,我会收到几个 DNSRR(s) 作为回复,例如(使用 ans[DNSRR].show() 完成): ###[ DNS Resou
假设有这样一个实体类 @Entity public class User { ... public Collection followers; ... } 假设用户有成千上万的用户关注者。我想分页..
这个问题已经有答案了: 已关闭11 年前。 Possible Duplicate: Nested jQuery.each() - continue/break 这是我的代码: var steps =
我刚从 F# 开始,我想遍历字典,获取键和值。 所以在 C# 中,我会说: IDictionary resultSet = test.GetResults; foreach (DictionaryEn
我知道已经有很多关于如何迭代 ifstream 的答案,但没有一个真正帮助我找到解决方案。 我的问题是:我有一个包含多行数据的txt文件。 txt 文件的第一行告诉我其余数据是如何组成的。例如这是我的
我有 12 个情态动词。我想将每个模态的 .modal__content 高度与 viewport 高度 进行比较,并且如果特定模态 .modal__content 高度 vh addClass("c
在此JSFiddle (问题代码被注释掉)第一次单击空单元格会在隐藏输入中设置一个值,并将单元格的背景颜色设置为绿色。单击第二个空表格单元格会设置另一个隐藏输入的值,并将第二个单元格的背景颜色更改为红
这是一个非常具体的问题,我似乎找不到任何特别有帮助的内容。我有一个单链表(不是一个实现的链表,这是我能找到的全部),其中节点存储一个 Student 对象。每个 Student 对象都有变量,尽管我在
有没有办法迭代 IHTMLElementCollection? 比如 var e : IHTMLLinkElement; elementCollection:IHTMLElementCollect
我正在尝试用 Java 取得高分。基本上我想要一个 HashMap 来保存 double 值(因此索引从最高的 double 值开始,这样我更容易对高分进行排序),然后第二个值将是客户端对象,如下所示
我想在宏函数中运行 while/until 循环,并限制其最大迭代次数。我找到了如何在“通常”sas 中执行此操作: data dataset; do i=1 to 10 until(con
Iterator iterator = plugin.inreview.keySet().iterator(); while (iterator.hasNext()) { Player key
晚上好我有一个简单的问题,我警告你我是序言的新手。假设有三个相同大小的列表,每个列表仅包含 1、0 或 -1。我想验证对于所有 i,在三个列表的第 i 个元素中,只有一个非零。 此代码针对固定的 i
我在 scheme 中构建了一个递归函数,它将在某些输入上重复给定函数 f, n 次。 (define (recursive-repeated f n) (cond ((zero? n) iden
我是一名优秀的程序员,十分优秀!