- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
给定一个像这样的制表符分隔文件:
$ head train.txt
The Fulton County Grand Jury said Friday an investigation of Atlanta's recent primary election produced `` no evidence '' that any irregularities took place . AT NP-TL NN-TL JJ-TL NN-TL VBD NR AT NN IN NP$ JJ NN NN VBD `` AT NN '' CS DTI NNS VBD NN .
The jury further said in term-end presentments that the City Executive Committee , which had over-all charge of the election , `` deserves the praise and thanks of the City of Atlanta '' for the manner in which the election was conducted . AT NN RBR VBD IN NN NNS CS AT NN-TL JJ-TL NN-TL , WDT HVD JJ NN IN AT NN , `` VBZ AT NN CC NNS IN AT NN-TL IN-TL NP-TL '' IN AT NN IN WDT AT NN BEDZ VBN .
The September-October term jury had been charged by Fulton Superior Court Judge Durwood Pye to investigate reports of possible `` irregularities '' in the hard-fought primary which was won by Mayor-nominate Ivan Allen Jr. . AT NP NN NN HVD BEN VBN IN NP-TL JJ-TL NN-TL NN-TL NP NP TO VB NNS IN JJ `` NNS '' IN AT JJ NN WDT BEDZ VBN IN NN-TL NP NP NP .
`` Only a relative handful of such reports was received '' , the jury said , `` considering the widespread interest in the election , the number of voters and the size of this city '' . `` RB AT JJ NN IN JJ NNS BEDZ VBN '' , AT NN VBD , `` IN AT JJ NN IN AT NN , AT NN IN NNS CC AT NN IN DT NN '' .
The jury said it did find that many of Georgia's registration and election laws `` are outmoded or inadequate and often ambiguous '' . AT NN VBD PPS DOD VB CS AP IN NP$ NN CC NN NNS `` BER JJ CC JJ CC RB JJ '' .
It recommended that Fulton legislators act `` to have these laws studied and revised to the end of modernizing and improving them '' . PPS VBD CS NP NNS VB `` TO HV DTS NNS VBN CC VBN IN AT NN IN VBG CC VBG PPO '' .
The grand jury commented on a number of other topics , among them the Atlanta and Fulton County purchasing departments which it said `` are well operated and follow generally accepted practices which inure to the best interest of both governments '' . AT JJ NN VBD IN AT NN IN AP NNS , IN PPO AT NP CC NP-TL NN-TL VBG NNS WDT PPS VBD `` BER QL VBN CC VB RB VBN NNS WDT VB IN AT JJT NN IN ABX NNS '' .
Merger proposed NN-HL VBN-HL
However , the jury said it believes `` these two offices should be combined to achieve greater efficiency and reduce the cost of administration '' . WRB , AT NN VBD PPS VBZ `` DTS CD NNS MD BE VBN TO VB JJR NN CC VB AT NN IN NN '' .
The City Purchasing Department , the jury said , `` is lacking in experienced clerical personnel as a result of city personnel policies '' . AT NN-TL VBG-TL NN-TL , AT NN VBD , `` BEZ VBG IN VBN JJ NNS CS AT NN IN NN NNS NNS '' .
只有第一列(由制表符分隔)很重要,我想从第一列中提取唯一的单词列表(包括标点符号)并输出到文件中。假设单词之间用空格分隔,即:
$ head train.txt | cut -f1
The Fulton County Grand Jury said Friday an investigation of Atlanta's recent primary election produced `` no evidence '' that any irregularities took place .
The jury further said in term-end presentments that the City Executive Committee , which had over-all charge of the election , `` deserves the praise and thanks of the City of Atlanta '' for the manner in which the election was conducted .
The September-October term jury had been charged by Fulton Superior Court Judge Durwood Pye to investigate reports of possible `` irregularities '' in the hard-fought primary which was won by Mayor-nominate Ivan Allen Jr. .
`` Only a relative handful of such reports was received '' , the jury said , `` considering the widespread interest in the election , the number of voters and the size of this city '' .
The jury said it did find that many of Georgia's registration and election laws `` are outmoded or inadequate and often ambiguous '' .
It recommended that Fulton legislators act `` to have these laws studied and revised to the end of modernizing and improving them '' .
The grand jury commented on a number of other topics , among them the Atlanta and Fulton County purchasing departments which it said `` are well operated and follow generally accepted practices which inure to the best interest of both governments '' .
Merger proposed
However , the jury said it believes `` these two offices should be combined to achieve greater efficiency and reduce the cost of administration '' .
The City Purchasing Department , the jury said , `` is lacking in experienced clerical personnel as a result of city personnel policies '' .
$ head train.txt | cut -f2
AT NP-TL NN-TL JJ-TL NN-TL VBD NR AT NN IN NP$ JJ NN NN VBD `` AT NN '' CS DTI NNS VBD NN .
AT NN RBR VBD IN NN NNS CS AT NN-TL JJ-TL NN-TL , WDT HVD JJ NN IN AT NN , `` VBZ AT NN CC NNS IN AT NN-TL IN-TL NP-TL '' IN AT NN IN WDT AT NN BEDZ VBN .
AT NP NN NN HVD BEN VBN IN NP-TL JJ-TL NN-TL NN-TL NP NP TO VB NNS IN JJ `` NNS '' IN AT JJ NN WDT BEDZ VBN IN NN-TL NP NP NP .
`` RB AT JJ NN IN JJ NNS BEDZ VBN '' , AT NN VBD , `` IN AT JJ NN IN AT NN , AT NN IN NNS CC AT NN IN DT NN '' .
AT NN VBD PPS DOD VB CS AP IN NP$ NN CC NN NNS `` BER JJ CC JJ CC RB JJ '' .
PPS VBD CS NP NNS VB `` TO HV DTS NNS VBN CC VBN IN AT NN IN VBG CC VBG PPO '' .
AT JJ NN VBD IN AT NN IN AP NNS , IN PPO AT NP CC NP-TL NN-TL VBG NNS WDT PPS VBD `` BER QL VBN CC VB RB VBN NNS WDT VB IN AT JJT NN IN ABX NNS '' .
NN-HL VBN-HL
WRB , AT NN VBD PPS VBZ `` DTS CD NNS MD BE VBN TO VB JJR NN CC VB AT NN IN NN '' .
AT NN-TL VBG-TL NN-TL , AT NN VBD , `` BEZ VBG IN VBN JJ NNS CS AT NN IN NN NNS NNS '' .
我可以这样做:
$ python
>>> fout = open('word.dict', 'w')
>>> fout.write('\n'.join(list(set(zip(*[line.split('\t')[0].lower().split() for line in open('train.txt')])[0]))))
>>> exit()
$ head word.dict
trenton
brevet
secondly
fig.
magnetic
doubts
monte
elisabeth
four
facilities
但是有没有办法在 shell/bash 中提取相同的单词列表?
最佳答案
试试这个:
cut -f1 file | tr -s '[:space:]' '\n' | tr '[:upper:]' '[:lower:]' | sort -u
cut -f1
提取第一个制表符分隔的列
tr -s '[:space:]' '\n'
用换行符替换每个空格,有效地创建一个单词列表,每个单词都在自己的行中。
tr '[:upper:]' '[:lower:]'
将行转换为全小写。
sort -u
对生成的单词列表进行排序,省略重复项 (-u
)。
关于python - 如何从制表符分隔的文本文件中提取唯一的单词列表?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40416064/
使用制表符 - 最初,当我构建表时,我可以在列定义中提供验证选项: {title:"Rating", field:"rating", editor:"input", validator:"requir
在 Notepad++ 中有一个非常方便的按钮,我可以按下它来查看空格、制表符和换行符所在的符号,这样我就可以看到哪些空格是由空格引起的,哪些是由制表符引起的。我可以在 emacs 中做到这一点吗?如
我在 .NET Windows 窗体应用程序中使用 RichTextBox 控件。我允许用户在文本框本身内按 TAB 键。但是,当我将 .Text 值保存在文本框中时,它将显示如下: "This[]i
我想知道如何使用 Interactive Demo 上使用的选择器来清除过滤器对于性别标题。 最佳答案 如果您询问如何向选择 header 过滤器添加空选项,则只需在 headerFilterPara
我有一个包含搜索结果的制表器。这是代码: var table = new Tabulator("#json-table", { layout:"fitDataFill", //init
我正在尝试构建一个可由用户修改的交互式表格。就我而言,原始数据集是本地对象数组。 制表符具有用于删除行的buttonCross选项,但它仅影响表格视觉效果。如何让它找到该行呈现的匹配对象并将其从表数据
我正在制作许多原始 html 表格并使用它们将它们转换为制表符 var table = new Tabulator("#main", { layout:"fitColumns", to
这个问题在这里已经有了答案: Any way for a combo box with 2 values per line? (3 个答案) 关闭 9 年前。 我有一个包含各种项目的 CSV 文件。
我是 JavaScript 的新手,目前正在学习如何使用 Tabulator(除此问题外它工作得很好)。 我想为每个列标题添加一个菜单按钮,然后打开一个下拉菜单。从此菜单中,用户应该能够选择“Grou
我正在尝试处理文本区域中的粘贴代码,并希望对粘贴到第一个非间距字符的内容进行左 trim 。我想采用以下代码: if (foo) { console.log(bar);
我预计 TAB 字符会比 SPACE 字符宽,但在 HTML5 canvas 中,它们是相同的。加上其他一些,这不是那么重要: var c=document.getElementById('mycan
在 shell(GNU bash,版本 4.2.47(1)-release (x86_64-suse-linux-gnu))中,当我点击自动完成选项卡时,“$”在之后被转义变量名称已完成,但如果没有完
我正在制作一个程序,我将从 mysql 表中的 txt 文件加载数据。我将创建具有特定字段的表,然后我将从其中的 txt 文件加载数据。我正在使用 java 来执行该程序。 我写的是下面的内容: pr
Tabulator 有一个名为Column Calculations 的模块。 我需要添加页面总和和总和 例如,我可以添加页脚来计算总和,但我无法添加可见行总和,或者换句话说当前页面的总和仅在同一时间
我不是 javascript 专家,所以我有一个简短的问题,有人知道如何在这里使用 JavaScript tabifier - http://www.barelyfitz.com/projects/t
这是我之前问题的延续(如果你好奇,请检查它们)。 我已经看到了隧道尽头的曙光,但还有最后一个问题。 出于某种原因,每一行都以制表符开头。 我怎样才能忽略第一个字符(在我的例子中是“制表符”(\t))?
请指教以下理解问题??? 我在我的 Linux 机器上输入:(以便通过 top 命令获取总内存) top -n1 | grep Mem: Mem: 2075024k total, 2059064
我该如何解决这个问题: 原因: java.lang.IllegalArgumentException: String [\t] with length 2 cannot be co
1.用法 \t 表示制表符,相当于制表符 前面的输出内容位数为8的倍数,\t将输出8个空格 前面的输出内容位数不是8的倍数,\t将补足8位 2.测试用例 少于8位 等于8位 大于8位
我正在尝试将 Tabulator v4.6.3 与 fitData 布局一起使用。这是我的代码: var table = new Tabulator("#cowTable", { da
我是一名优秀的程序员,十分优秀!