linux - 如何使用 Unix 排序命令按列中人类可读的数字文件大小进行排序？-6ren

linux - 如何使用 Unix 排序命令按列中人类可读的数字文件大小进行排序？

转载作者：太空狗更新时间：2023-10-29 11:06:44

25

4

这个问题现在有了答案 - 滚动到这篇文章的末尾寻找解决方案。

如果答案已经在这里，我深表歉意，但到目前为止我找到的所有答案都建议使用 -h 标志或 -n 标志，但这些都不适合我...

我有一些来自 curl 命令的输出，它为我提供了几列数据。其中一列是人类可读的文件大小(“1.6mb”、“4.3gb”等)。

我正在使用 unix sort 命令按相关列排序，但它似乎试图按字母顺序而不是数字顺序排序。我试过同时使用 -n 和 -h 标志，但尽管它们确实改变了顺序，但在这两种情况下顺序在数字上都不正确。

我在 CentOS Linux 机器上，版本 7.2.1511。 sort 的版本是“sort (GNU coreutils) 8.22”。

我试过以这些不同的格式使用 -h 标志:

curl localhost:9200/_cat/indices | sort -k9,9h | head -n5
curl localhost:9200/_cat/indices | sort -k9 -h | head -n5
curl localhost:9200/_cat/indices | sort -k 9 -h | head -n5
curl localhost:9200/_cat/indices | sort -k9h | head -n5

我总是得到这些结果:

green open indexA            5 1        0       0   1.5kb    800b
green open indexB            5 1  9823178 2268791 152.9gb  76.4gb
green open indexC            5 1    35998    7106 364.9mb 182.4mb
green open indexD            5 1      108      11 387.1kb 193.5kb
green open indexE            5 1        0       0   1.5kb    800b

我已经尝试使用与上述相同格式的 -n 标志:

curl localhost:9200/_cat/indices | sort -k9,9n | head -n5
curl localhost:9200/_cat/indices | sort -k9 -n | head -n5
curl localhost:9200/_cat/indices | sort -k 9 -n | head -n5
curl localhost:9200/_cat/indices | sort -k9n | head -n5

我总是得到这些结果:

green open index1      5 1     1021       0   3.2mb   1.6mb
green open index2      5 1     8833       0   4.1mb     2mb
green open index3      5 1     4500       0     5mb   2.5mb
green open index4      1 0        3       0   3.9kb   3.9kb
green open index5      3 1  2516794       0   8.6gb   4.3gb

编辑:原来有两个问题:

1) sort 期望看到大写的单个字母 - M、K 和 G 而不是 mb、kb 和 gb(对于字节，您可以留空)。

2) sort 将包含前导空格，除非您明确排除它们，这会扰乱顺序。

解决方案是将小写字母替换为大写字母并使用 -b 标志使排序忽略前导空格(我将此答案基于下面@Vinicius 的解决方案，因为如果您不知道它更容易阅读正则表达式):

curl localhost:9200/_cat/indices | tr '[kmg]b' '[KMG] ' | sort -k9hb

最佳答案

您的“m”和“g”单位应该是大写的。 GNU sort manual阅读:

-h --human-numeric-sort --sort=human-numeric

Sort numerically, first by numeric sign (negative, zero, or positive); then by SI suffix (either empty, or ‘k’ or ‘K’, or one of ‘MGTPEZY’, in that order; see Block size); and finally by numeric value.

您可以像这样使用 GNU sed 更改 curl 的输出:

curl localhost:9200/_cat/indices \
| sed 's/[0-9][mgtpezy]/\U&/g'
| sort -k9,9h \
| head -n5

产量:

green open index4      1 0        3       0   3.9kb   3.9kb
green open index1      5 1     1021       0   3.2Mb   1.6Mb
green open index2      5 1     8833       0   4.1Mb     2Mb
green open index3      5 1     4500       0     5Mb   2.5Mb
green open index5      3 1  2516794       0   8.6Gb   4.3Gb

其他字母如“b”将被视为“无单位”:

green open indexA            5 1        0       0   1.5kb    800b
green open indexE            5 1        0       0   1.5kb    800b
green open indexD            5 1      108      11 387.1kb 193.5kb
green open indexC            5 1    35998    7106 364.9Mb 182.4Mb
green open indexB            5 1  9823178 2268791 152.9Gb  76.4Gb

如果需要，您可以通过管道将排序后的输出中的单位改回小写 sed 's/[0-9][MGTPEZY]/\L&/g'

关于linux - 如何使用 Unix 排序命令按列中人类可读的数字文件大小进行排序？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/55926142/

25

4

0

文章推荐： c - sys/stat.h 中是否定义了 Mac 系统完整性保护

文章推荐： css - Rails:开发中的 Sprockets::Rails::Helper::AssetNotPrecompiled

文章推荐： iOS - 在 iOS7 上完成自动布局之前调用的 viewDidLayoutSubviews

文章推荐： css - 聚合物中的自定义样式与共享样式

nlp - (人类)文档的语言
有没有一种方法(程序、库)可以大致了解文档是用哪种语言编写的？我有一堆混合语言的文本文档(~500K)，需要导入支持 i18n 的 CMS (Drupal).. 我不需要完美的匹配，只需要一些猜测。
java - Java数字，“人类”舍入
Closed. This question needs details or clarity。它当前不接受答案。
html - 在一个网页上使用多种自然(人类)语言的最佳实践？
使用 UTF-8 编码。 (Multiple languages in one HTML page)。跨不同浏览器(包括 iPad 上的 Safari)在单个网页上正确显示多种人类语言的最佳做法是什
c++ - 使用两种(人类)语言的应用程序
我有一个完全可用的代码，它是为 Windows 编写的，是用 Visual Studio 构建的。我想做的是为该软件添加另一种语言。我的想法是在窗口角落放置两个标志(一个英语和一个德语)，并在用户点
Lua 字节码到 Lua 人类 "readable"
我刚刚得到一个脚本，我想对其进行一些更改，我正在寻找某人为我开发一份自由职业，以使我将提供的加载字符串可读以进行编辑。 Lua代码是这样的: ------------------------- ENG
algorithm - 高级(口语/人类)语言翻译项目？
有没有不是简单逐字翻译的语言翻译项目？一个具有先进算法/设计的？目前主流和流行的翻译软件，例如谷歌翻译，似乎是查找一个词或一组连续的词，然后将其直接翻译成不知何故确定为最佳匹配的内容。但是因为它不是
java - Twilio - 人类/非人类/机器人检测(IfMachine 参数)
基于 Twilio 的人类/非人类检测 - 我正在尝试通过如下所示的代码来检测调用是否被人类或机器人接听 HashMap params = new HashMap(); params.put("Fro
php - 我是否破坏了以下处理 3 种(人类)语言的 php 数组中的任何 "php good practice"？
这是目前(不确定)我能想到的处理多语言网站的最佳方式，它不涉及 gettext、zend_translate 或任何 php 插件或框架。我认为它非常简单:我有 3 种语言，我将它们的“内容”写在不

首页

博学

6Ren·AI

商城

linux - 如何使用 Unix 排序命令按列中人类可读的数字文件大小进行排序？