Whatever is assigned to the files
variable is incorrect. Use the following code.
分配给FILES变量的任何内容都是不正确的。使用以下代码。
import glob
import os
list_of_files = glob.glob('/path/to/folder/*') # * means all if need specific format then *.csv
latest_file = max(list_of_files, key=os.path.getctime)
print(latest_file)
max(files, key = os.path.getctime)
is quite incomplete code. What is files
? It probably is a list of file names, coming out of os.listdir()
.
这是一个非常不完整的代码。什么是文件?它可能是一个文件名列表,来自os.listdir()。
But this list lists only the filename parts (a. k. a. "basenames"), because their path is common. In order to use it correctly, you have to combine it with the path leading to it (and used to obtain it).
但是这个列表只列出了文件名部分(也就是“基本名称”),因为它们的路径是通用的。为了正确使用它,你必须将它与通向它的路径结合起来(并用来获得它)。
Such as (untested):
例如(未经测试):
def newest(path):
files = os.listdir(path)
paths = [os.path.join(path, basename) for basename in files]
return max(paths, key=os.path.getctime)
I lack the reputation to comment but ctime from Marlon Abeykoons response did not give the correct result for me. Using mtime does the trick though. (key=os.path.getmtime))
我缺乏评论的声誉,但是马龙·阿贝昆斯的ctime回应并没有给我正确的结果。(key=os.path.getmtime))
import glob
import os
list_of_files = glob.glob('/path/to/folder/*') # * means all if need specific format then *.csv
latest_file = max(list_of_files, key=os.path.getmtime)
print(latest_file)
I found two answers for that problem:
对于这个问题,我找到了两个答案:
python os.path.getctime max does not return latest
Difference between python - getmtime() and getctime() in unix system
在Unix系统中,Python os.path.getctime max不返回python-getmtime()和getctime()之间的最新差异
I've been using this in Python 3, including pattern matching on the filename.
我在Python3中使用了这一点,包括对文件名进行模式匹配。
from pathlib import Path
def latest_file(path: Path, pattern: str = "*"):
files = path.glob(pattern)
return max(files, key=lambda x: x.stat().st_ctime)
I would suggest using glob.iglob()
instead of the glob.glob()
, as it is more efficient.
我建议使用lob.iglob()而不是lob.lobb(),因为这样效率更高。
glob.iglob() Return an iterator which yields the same values as glob() without actually storing them all simultaneously.
Which means glob.iglob()
will be more efficient.
这意味着lob.iglob()将更加高效。
I mostly use below code to find the latest file matching to my pattern:
我主要使用以下代码来查找与我的模式匹配的最新文件:
LatestFile = max(glob.iglob(fileNamePattern),key=os.path.getctime)
LatestFile=max(lob.iglob(FileNamePattern),key=os.path.getctime)
NOTE:
There are variants of max
function, In case of finding the latest file we will be using below variant:
max(iterable, *[, key, default])
注意:max函数有很多变体,为了找到最新的文件,我们将使用下面的变体:max(iterable,*[,key,default])
which needs iterable so your first parameter should be iterable.
In case of finding max of nums we can use beow variant : max (num1, num2, num3, *args[, key])
它需要可迭代的,所以你的第一个参数应该是可迭代的。在寻找最大的nums的情况下,我们可以使用下面的变量:max(num1,num2,num3,*args[,key])
Try to sort items by creation time. Example below sorts files in a folder and gets first element which is latest.
尝试按创建时间对项目进行排序。下面的示例对文件夹中的文件进行排序,并获取最新的第一个元素。
import glob
import os
files_path = os.path.join(folder, '*')
files = sorted(
glob.iglob(files_path), key=os.path.getctime, reverse=True)
print files[0]
Most of the answers are correct but if there is a requirement like getting the latest two or three latest then it could fail or need to modify the code.
大多数答案都是正确的,但如果需要获得最新的两到三个,那么它可能会失败或需要修改代码。
I found the below sample is more useful and relevant as we can use the same code to get the latest 2,3 and n files too.
我发现下面的示例更有用,也更相关,因为我们也可以使用相同的代码来获取最新的2,3和n文件。
import glob
import os
folder_path = "/Users/sachin/Desktop/Files/"
files_path = os.path.join(folder_path, '*')
files = sorted(glob.iglob(files_path), key=os.path.getctime, reverse=True)
print (files[0]) #latest file
print (files[0],files[1]) #latest two files
A much faster method on windows (0.05s), call a bat script that does this:
Windows(0.05s)上的一种速度更快的方法是调用执行以下操作的BAT脚本:
get_latest.bat
Get_latest.bat
@echo off
for /f %%i in ('dir \\directory\in\question /b/a-d/od/t:c') do set LAST=%%i
%LAST%
where \\directory\in\question
is the directory you want to investigate.
其中,\\DIRECTORY\IN\QUEST是您要调查的目录。
get_latest.py
Get_latest.py
from subprocess import Popen, PIPE
p = Popen("get_latest.bat", shell=True, stdout=PIPE,)
stdout, stderr = p.communicate()
print(stdout, stderr)
if it finds a file stdout
is the path and stderr
is None.
如果找到文件,则路径为stdout,而stderr为None。
Use stdout.decode("utf-8").rstrip()
to get the usable string representation of the file name.
使用stdout.decode(“utf-8”).rstrie()获取文件名的可用字符串表示。
(Edited to improve answer)
(编辑以改进答案)
First define a function get_latest_file
首先定义函数get_latest_file
def get_latest_file(path, *paths):
fullpath = os.path.join(path, paths)
...
get_latest_file('example', 'files','randomtext011.*.txt')
You may also use a docstring !
您也可以使用文档字符串!
def get_latest_file(path, *paths):
"""Returns the name of the latest (most recent) file
of the joined path(s)"""
fullpath = os.path.join(path, *paths)
If you use Python 3, you can use iglob instead.
如果您使用的是Python3,则可以使用iglob。
Complete code to return the name of latest file:
完成代码以返回最新文件的名称:
def get_latest_file(path, *paths):
"""Returns the name of the latest (most recent) file
of the joined path(s)"""
fullpath = os.path.join(path, *paths)
files = glob.glob(fullpath) # You may use iglob in Python3
if not files: # I prefer using the negation
return None # because it behaves like a shortcut
latest_file = max(files, key=os.path.getctime)
_, filename = os.path.split(latest_file)
return filename
I have tried to use the above suggestions and my program crashed, than I figured out the file I'm trying to identify was used and when trying to use 'os.path.getctime' it crashed.
what finally worked for me was:
我尝试使用上面的建议,但我的程序崩溃了,然后我发现我试图识别的文件被使用了,当我试图使用‘os.path.getctime’时,它崩溃了。最终对我起作用的是:
files_before = glob.glob(os.path.join(my_path,'*'))
**code where new file is created**
new_file = set(files_before).symmetric_difference(set(glob.glob(os.path.join(my_path,'*'))))
this codes gets the uncommon object between the two sets of file lists
its not the most elegant, and if multiple files are created at the same time it would probably won't be stable
这个代码得到了两组文件列表之间的不常见对象,它不是最优雅的,如果同时创建多个文件,它可能会不稳定
On Linux you can also call shell
tools from python
在Linux上,您还可以从Python调用外壳工具
subprocess.run
requires python 3.5+
Subprocess.run需要使用python3.5+
import subprocess
def find_latest_files(target_dir, count):
cmd = f"ls -t {target_dir} | head -n{count}"
try:
output = subprocess.run(cmd, shell=True, text=True, capture_output=True, check=False)
except subprocess.CalledProcessError as err:
sys.exit(f"Error: finding last modified file {err.output[1]}")
# returns a list[]
return output.stdout.splitlines()
更多回答
What if instead of a file I want to find the latest created/modified folder ?
如果我想要查找最新创建/修改的文件夹而不是文件,该怎么办?
@Link the same code works for that. If you want to check its a folder or not u can check if os.path.isdir(latest_file):
@Link同样的代码也适用于此。如果你想检查它的一个文件夹或不u可以检查如果os.path.isdir(latest_file):
Weird. I had to use "min" to get the latest file. Some searching around hinted that it's os specific.
奇怪了我不得不使用“min”来获取最新的文件。一些搜索暗示这是操作系统特有的。
This is an excellent answer--THANK YOU! I like to work with pathlib.Path
objects more than strings and os.path. With pathlib.Path objects your answer becomes: list_of_paths = folder_path.glob('*'); latest_path = max(list_of_paths, key=lambda p: p.stat().st_ctime)
这是一个很好的回答--谢谢!我更喜欢使用路径lib.Path对象,而不是字符串和os.Path。有了pathlib.Path对象,您的答案就变成了:list_of_Path=Folders_path.lobb(‘*’);Latest_Path=max(list_of_Path,key=lambda p:p.stat().st_ctime)
@phil You can still use os.path.getctime
as key, even with Path
objects.
@Phil您仍然可以使用os.path.getctime作为键,即使是使用Path对象。
I am sure the downvoters can explain what exactly is wrong.
我相信反对者可以解释到底哪里出了问题。
Dunno, tested for you, it does seem to work. On top of that, you were the only one to care to explain a bit. Reading the accepted answer made me think that 'glob' thing was needed, whereas it's absolutely not. Thanks
不知道,给你测试过了,看起来确实有效。最重要的是,你是唯一一个愿意解释一下的人。阅读公认的答案让我认为“glob”的东西是必要的,而它绝对不是。谢谢
@David Of course. Just insert if basename.endswith('.csv')
into the list comprehension.
当然是@David。只需在列表理解中插入if basename.endswith(‘.csv’)即可。
@BreakBadSP If you want flexibility, you are right. If you are restricted to a certain directory, I don't see how yours can possibly more efficient. But sometimes, readability is more important than efficiency, so yours might indeed be better in that sense.
@BreakBadSP如果你想要灵活性,你是对的。如果你被限制在一个特定的目录,我看不出你的目录可能会更有效率。但有时,可读性比效率更重要,因此从这个意义上讲,您的文档可能确实更好。
Thanks for this, I've used this in so many of my ETL functions!
谢谢你,我在很多ETL函数中都使用了这个!
On a Mac getctime was also the wrong result, with getmtime fixing it for me as well.
在Mac上,getctime也是错误的结果,而getmtime也为我修复了它。
This would be even better if the max arg default was added to support no files matching the path/pattern - max (and min) raise ValueError in that situation so better to set a default - requires python 3.4+
如果添加最大参数缺省值以支持任何与路径/模式匹配的文件,则效果会更好-在这种情况下,max(和min)将提高ValueError,因此最好设置一个缺省值-需要python3.4+
I like this max()
sort. In my case, I used a different key=os.path.basename
since the filenames had timestamps in them.
我喜欢这种max()类型。在我的例子中,我使用了一个不同的key=os.path.basename,因为文件名中有时间戳。
In your example, if I want to include the folder path for the fileNamePattern, how to do it?
在您的示例中,如果我想包含fileNamePattern的文件夹路径,该如何操作?
Not sure why this attracting down votes, for those that need to do this task quickly this is the fastest method I could find. And sometimes it is necessary to do this very quickly.
不知道为什么这吸引了选票,对于那些需要快速完成这项任务的人来说,这是我能找到的最快方法。有时候,这件事必须做得非常快。
Have an upvote. I'm not doing this in Windows, but if you're looking for speed, the other answers require an iteration of all files in a directory. So if shell commands in your OS that specify a sort order of the listed files are available, pulling the first or last result of that should be faster.
投赞成票吧。我不是在Windows中这样做,但如果您想提高速度,其他答案需要对目录中的所有文件进行迭代。因此,如果操作系统中指定所列文件排序顺序的外壳命令可用,提取第一个或最后一个结果应该会更快。
Thanks I'm actually more concerned with a better solution than this (as in similarly fast but pure python) so was hoping someone could elaborate on that.
谢谢,我实际上更关心的是一个比这个更好的解决方案(就像在类似的快速但纯粹的Python中一样),所以我希望有人能详细说明这一点。
Sorry, but I had to downvote, and I'll give you the courtesy of explaining reasons why. The biggest reason is that it is not using python (not cross-platform) thus broken unless ran under Windows. Secondly, this is not a "faster method" (unless faster means quick-and-dirty-not-bothering-to-read-docs) --shelling out to another script is notoriously slow.
抱歉,我不得不投反对票,我会礼貌地向你解释原因。最大的原因是,它没有使用Python(不是跨平台),所以除非在Windows下运行,否则就会崩溃。其次,这不是一种“更快的方法”(除非更快的方法意味着快速和肮脏-不麻烦地阅读文档)--扩展到另一个脚本是出了名的慢。
@MarkHu Actually this script was born out of the necessity to check a large folder's content quickly from a python script. So in this case faster method means, gets the file name of newest folder the fastest (or faster than a pure python method). Feel free to add a similar script for linux, probably based on ls -Art | tail -n 1
. Please evaluate the performance of a solution before making claims about it.
@MarkHu实际上,这个脚本的诞生是因为需要从一个python脚本快速检查一个大文件夹的内容。所以在这种情况下,更快的方法意味着,以最快的速度获取最新文件夹的文件名(或比纯Python方法更快)。您可以随意为Linux添加一个类似的脚本,该脚本可能基于ls-Art|ail-n 1。请在对解决方案进行声明之前评估它的性能。
Where did you get the JuniperAccessLog-standalone-FCL_VPN
part from?
您从哪里获得Junipernet Log-standalone-FCL_VPN部件?
This fails on 0 length files under Windows 10.
在Windows 10下,这在0长度文件上失败。
我是一名优秀的程序员,十分优秀!