Using any sort
plus GNU awk for time functions and gensub()
:
对时间函数和genSub()使用Any Sort和GNU awk:
$ cat tst.sh
#!/usr/bin/env bash
sort -t, -k2,2 -k1,1n "${@:--}" |
awk '
BEGIN { FS=OFS="," }
($2 == p[2]) && ($3 == p[3]) {
for ( date=nextDate(p[1]); date<$1; date=nextDate(date) ) {
sub(/[^,]+/,date,p[0])
print p[0]
}
}
{
print
split($0,p)
p[0] = $0
}
function nextDate(date,secs) {
date = gensub(/(.{4})(..)(..)/,"\\1 \\2 \\3",1,date)
secs = mktime(date " 12 0 0") + (24 * 60 * 60)
return strftime("%Y%m%d",secs)
}
'
$ ./tst.sh file.txt
20230907,Allan,29,Marketing
20230908,Allan,29,Marketing
20230909,Allan,29,Marketing
20230910,Allan,29,Marketing
20230911,Allan,29,Marketing
20230912,Allan,29,VirtualAssistant
20230913,Allan,29,Programmer
20230920,Daniel,28,Engineer
20230921,Daniel,28,Engineer
20230922,Daniel,28, Photographer
Alternatively, using any awk just replace the nextDate()
function above with this version:
或者,使用任何awk只需将上面的nextDate()函数替换为以下版本:
function nextDate(date, i,tmp,year,month,day,maxDays) {
year = substr(date,1,4)+0
month = substr(date,5,2)+0
day = substr(date,7,2)+0
# see https://www.timeanddate.com/date/leapyear.html
if ( (month == 2) && !(year % 4) && ((year % 100) || !(year % 400)) ) {
maxDays = 29
}
else {
if ( !(1 in daysIn) ) {
split("1 31 2 28 3 31 4 30 5 31 6 30 7 31 8 31 9 30 10 31 11 30 12 31",tmp," ")
for ( i=1; i in tmp; i+=2 ) {
daysIn[i] = tmp[i+1]
}
}
maxDays = daysIn[month]
}
if ( ++day > maxDays ) {
day = 1
if ( ++month > 12 ) {
month = 1
year++
}
}
return sprintf("%04d%02d%02d",year,month,day)
}
Dates are a bit tricky to deal with in awk
, so I tend to write dedicated functions to deal with them.
在awk中处理日期有点麻烦,所以我倾向于编写专门的函数来处理它们。
Here is an example solution that assumes a sorted input (tested with gawk
and mawk
):
下面是一个示例解决方案,它假定输入是排序的(使用gawk和mawk进行了测试):
parse.awk
Parse.awk
BEGIN {
sid = 60 * 60 * 24; # Seconds in a day
FS = OFS = ",";
}
# Convert date to seconds-since-the-unix-epoch (Jan 1, 1970)
function getdate(d) {
year = substr(d, 1, 4);
month = substr(d, 5, 2);
day = substr(d, 7, 2);
return mktime(year" "month" "day" 00 00 00");
}
# Add one day to date
function addday(d) {
return strftime("%Y%m%d", getdate(d) + sid);
}
($2 SUBSEP $3) in nameage {
d = nameage[$2,$3];
while (addday(d) < $1) {
print (addday(d), $2, $3, occupation[$2,$3]);
d = addday(d);
}
}
{
nameage[$2,$3] = $1;
occupation[$2,$3] = $4;
}
1;
Run it like this:
按如下方式运行:
awk -f parse.awk infile
Output:
产出:
20230907,Allan,29,Marketing
20230908,Allan,29,Marketing
20230909,Allan,29,Marketing
20230910,Allan,29,Marketing
20230911,Allan,29,Marketing
20230912,Allan,29,VirtualAssistant
20230913,Allan,29,Programmer
20230920,Daniel,28,Engineer
20230921,Daniel,28,Engineer
20230922,Daniel,28, Photographer
Solution in TXR:
TXR中的解决方案:
$ txr filldate.txr data
20230907,Allan,29,Marketing
20230908,Allan,29,Marketing
20230909,Allan,29,Marketing
20230910,Allan,29,Marketing
20230911,Allan,29,Marketing
20230912,Allan,29,VirtualAssistant
20230913,Allan,29,Programmer
20230920,Daniel,28,Engineer
20230921,Daniel,28,Engineer
20230922,Daniel,28, Photographer
Code in filldate.txr
:
填充日期.txr中的代码:
@(repeat)
@ (cases)
@date0,@name,@age,@rest0
@ (trailer)
@date1,@name,@age,@rest1
@ (bind dates @[giterate (op nequal date1)
(opip
(time-parse "%Y%m%d")
.(time-utc)
(+ 86400)
(time-str-utc "%Y%m%d"))
date0])
@ (output)
@ (repeat)
@dates,@name,@age,@rest0
@ (end)
@ (end)
@ (or)
@line
@ (do (put-line line))
@ (end)
@(end)
Colorized with Vim:
用Vim上色:
This might work for you (GNU sed and date):
这可能对你有用(GNU sed和date):
sed -E '/\n/!N
/(,[^,]*,[^,]*,).*\n.*\1/{/^([^,]*,).*\n\1/D
s/^([^,]*)(,.*\n)/echo "&$(date -d "\1 +1 day" +%Y%m%d)\2"/e}
P;D' file
Use the N;P;D
sed idiom which appends the next line to the current line, prints/deletes the first line and repeats.
使用N;P;D sed成语,将下一行添加到当前行,打印/删除第一行,然后重复。
In this case, append the next line if one or no other lines in the pattern space.
在这种情况下,如果模式空间中只有一行或没有其他行,则追加下一行。
Compare the second and third fields of those lines and if the same:
比较这些行的第二个和第三个字段,如果相同:
Compare the dates of those lines and if the same then delete the first line and repeat.
比较这些行的日期,如果相同,则删除第一行并重复。
If the dates are not the same, insert a line with tomorrows date between the first and second lines.
如果日期不同,请在第一行和第二行之间插入带有明天日期的行。
Regardless, print/delete the first line and repeat.
无论如何,打印/删除第一行,然后重复。
N.B. The brunt of the work is done by the evaluated substitution which needs to interpolate an echo command.
注意:主要工作是由评估的替换完成的,该替换需要插入回声命令。
If the file is not sorted use:
如果未对文件进行排序,请使用:
sort -t, -k2,3 -k1,1 file |
sed -E '/\n/!N
/(,[^,]*,[^,]*,).*\n.*\1/{/^([^,]*,).*\n\1/D
s/^([^,]*)(,.*\n)/echo "&$(date -d "\1 +1 day" +%Y%m%d)\2"/e}
P;D'
Using sqlite3:
使用Sqlite3:
#!/bin/bash
# create header in file.txtA
a=$(head -1 file.txt | sed -e 's/[^,]*,*/#/g')
b=abcdefgh;
c=$(echo ${b:0:${#a}} | sed -E 's/(.)/\1,/g')
c=${c%?}
sed -i -e "1i$c" file.txt
# create output
sql=".import file.txt file
update file set a=date(substr(a,1,4)||'-'||substr(a,5,2)||'-'||substr(a,7,2));
with cte as (
select a,b,c,d,lead(a) over (partition by b order by a) as m from file
union all
select date(a,'1 day'),b,c,d,m
from cte
where date(a,'1 day') < m
)
select replace(a,'-','') as a,b,c,d
from cte
order by a,b;
"
db=$(mktemp)
echo "$sql" | sqlite3 -csv $db
# strip header and remove db
sed -i -e '1d' file.txt
rm -f $db
- A temporary file is created for the database (using:
mktemp
)
- Headers are inserted into
file.txt
. This is done using the most simple name I could think of, so the first column will be named a
, the second b
, ... (see NOTE)
- In the SQL part the dates are formatted so
20230907
becomes 2023-09-07
. This is so SQLite can calculate tomorrow using date('2023-09-07','1 day')
.
- When querying the
-
signs in the date are removed again.
- The
cte
add records based on the value of lead(a) over (partition by b order by a)
which is the next data available for a person.
- Because I change original input file (adding a header), I also remove this line at the end of the script. (see NOTE)
- Finally the temporary database is removed.
NOTE: I should not change the original input, but use another tempfile for that...
注意:我不应该更改原始输入,而是使用另一个临时文件...
NOTE2: Because CSV is a weird format, output is not what you might expect for the Photographer
. Output will be " Photographer"
. For more reading on this see: Spaces between separator and value in csv file
注2:因为CSV是一种奇怪的格式,所以输出结果并不是你对摄影师的期望。输出将是“摄影师”。有关详细信息,请参阅:CSV文件中的分隔符和值之间的空格
Here's a way to do it without needing to call mktime()
and strftime()
nonstop at the expense of lots of regex (it should be able to handle all leap year calculations) ::
以下是一种不需要不停地调用mktime()和strftime()而以大量正则表达式为代价的方法(它应该能够处理所有的闰年计算):
echo '
20230907,Allan,29,Marketing
20230912,Allan,29,VirtualAssistant
20230913,Allan,29,Programmer
20230920,Daniel,28,Engineer
20240317,Daniel,28, Photographer' |
mawk '
function ______(__, ___, ____, _____, _) {
__ = (_ = "")__
____ = (_)____
while((__ = sprintf("%d",__ += __~/([01][1-9]|10|2[0-7])$/ ? 1 \
: __~ /1231$/ ? 8870 \
: __~/(11|[469])30$/ ? 71 : __~ /[013578]31$/ ? 70 \
: __~/22[89]$/ && __~((_ = \
__~"(([2468][048]|[13579][26]|(^|[0+-])[48])(00)?|"\
"0000|^[+-]?0*)....$") ? "9$" : "8$") ? 73-_ : 1)) < ____)
print __,
___, _____
} BEGIN { FS = (_ = "[ \t]*") (____ = OFS = ",")_ } {
if (___[_ = $2____$3] && __[_] + 1 < +$1)
______(__[_], _, $1, ___[_])
print __[_] = $1, _, ___[_] = $4 }'
20230907,Allan,29,Marketing
20230908,Allan,29,Marketing
20230909,Allan,29,Marketing
20230910,Allan,29,Marketing
20230911,Allan,29,Marketing
20230912,Allan,29,VirtualAssistant
20230913,Allan,29,Programmer
20230920,Daniel,28,Engineer
20230921,Daniel,28,Engineer
20230922,Daniel,28,Engineer
20230923,Daniel,28,Engineer
20230924,Daniel,28,Engineer
20230925,Daniel,28,Engineer
20230926,Daniel,28,Engineer
20230927,Daniel,28,Engineer
20230928,Daniel,28,Engineer
20230929,Daniel,28,Engineer
20230930,Daniel,28,Engineer
20231001,Daniel,28,Engineer
20231002,Daniel,28,Engineer
20231003,Daniel,28,Engineer
20231004,Daniel,28,Engineer
20231005,Daniel,28,Engineer
20231006,Daniel,28,Engineer
20231007,Daniel,28,Engineer
20231008,Daniel,28,Engineer
20231009,Daniel,28,Engineer
20231010,Daniel,28,Engineer
20231011,Daniel,28,Engineer
20231012,Daniel,28,Engineer
20231013,Daniel,28,Engineer
20231014,Daniel,28,Engineer
20231015,Daniel,28,Engineer
20231016,Daniel,28,Engineer
20231017,Daniel,28,Engineer
20231018,Daniel,28,Engineer
20231019,Daniel,28,Engineer
20231020,Daniel,28,Engineer
20231021,Daniel,28,Engineer
20231022,Daniel,28,Engineer
20231023,Daniel,28,Engineer
20231024,Daniel,28,Engineer
20231025,Daniel,28,Engineer
20231026,Daniel,28,Engineer
20231027,Daniel,28,Engineer
20231028,Daniel,28,Engineer
20231029,Daniel,28,Engineer
20231030,Daniel,28,Engineer
20231031,Daniel,28,Engineer
20231101,Daniel,28,Engineer
20231102,Daniel,28,Engineer
20231103,Daniel,28,Engineer
20231104,Daniel,28,Engineer
20231105,Daniel,28,Engineer
20231106,Daniel,28,Engineer
20231107,Daniel,28,Engineer
20231108,Daniel,28,Engineer
20231109,Daniel,28,Engineer
20231110,Daniel,28,Engineer
20231111,Daniel,28,Engineer
20231112,Daniel,28,Engineer
20231113,Daniel,28,Engineer
20231114,Daniel,28,Engineer
20231115,Daniel,28,Engineer
20231116,Daniel,28,Engineer
20231117,Daniel,28,Engineer
20231118,Daniel,28,Engineer
20231119,Daniel,28,Engineer
20231120,Daniel,28,Engineer
20231121,Daniel,28,Engineer
20231122,Daniel,28,Engineer
20231123,Daniel,28,Engineer
20231124,Daniel,28,Engineer
20231125,Daniel,28,Engineer
20231126,Daniel,28,Engineer
20231127,Daniel,28,Engineer
20231128,Daniel,28,Engineer
20231129,Daniel,28,Engineer
20231130,Daniel,28,Engineer
20231201,Daniel,28,Engineer
20231202,Daniel,28,Engineer
20231203,Daniel,28,Engineer
20231204,Daniel,28,Engineer
20231205,Daniel,28,Engineer
20231206,Daniel,28,Engineer
20231207,Daniel,28,Engineer
20231208,Daniel,28,Engineer
20231209,Daniel,28,Engineer
20231210,Daniel,28,Engineer
20231211,Daniel,28,Engineer
20231212,Daniel,28,Engineer
20231213,Daniel,28,Engineer
20231214,Daniel,28,Engineer
20231215,Daniel,28,Engineer
20231216,Daniel,28,Engineer
20231217,Daniel,28,Engineer
20231218,Daniel,28,Engineer
20231219,Daniel,28,Engineer
20231220,Daniel,28,Engineer
20231221,Daniel,28,Engineer
20231222,Daniel,28,Engineer
20231223,Daniel,28,Engineer
20231224,Daniel,28,Engineer
20231225,Daniel,28,Engineer
20231226,Daniel,28,Engineer
20231227,Daniel,28,Engineer
20231228,Daniel,28,Engineer
20231229,Daniel,28,Engineer
20231230,Daniel,28,Engineer
20231231,Daniel,28,Engineer
20240101,Daniel,28,Engineer
20240102,Daniel,28,Engineer
20240103,Daniel,28,Engineer
20240104,Daniel,28,Engineer
20240105,Daniel,28,Engineer
20240106,Daniel,28,Engineer
20240107,Daniel,28,Engineer
20240108,Daniel,28,Engineer
20240109,Daniel,28,Engineer
20240110,Daniel,28,Engineer
20240111,Daniel,28,Engineer
20240112,Daniel,28,Engineer
20240113,Daniel,28,Engineer
20240114,Daniel,28,Engineer
20240115,Daniel,28,Engineer
20240116,Daniel,28,Engineer
20240117,Daniel,28,Engineer
20240118,Daniel,28,Engineer
20240119,Daniel,28,Engineer
20240120,Daniel,28,Engineer
20240121,Daniel,28,Engineer
20240122,Daniel,28,Engineer
20240123,Daniel,28,Engineer
20240124,Daniel,28,Engineer
20240125,Daniel,28,Engineer
20240126,Daniel,28,Engineer
20240127,Daniel,28,Engineer
20240128,Daniel,28,Engineer
20240129,Daniel,28,Engineer
20240130,Daniel,28,Engineer
20240131,Daniel,28,Engineer
20240201,Daniel,28,Engineer
20240202,Daniel,28,Engineer
20240203,Daniel,28,Engineer
20240204,Daniel,28,Engineer
20240205,Daniel,28,Engineer
20240206,Daniel,28,Engineer
20240207,Daniel,28,Engineer
20240208,Daniel,28,Engineer
20240209,Daniel,28,Engineer
20240210,Daniel,28,Engineer
20240211,Daniel,28,Engineer
20240212,Daniel,28,Engineer
20240213,Daniel,28,Engineer
20240214,Daniel,28,Engineer
20240215,Daniel,28,Engineer
20240216,Daniel,28,Engineer
20240217,Daniel,28,Engineer
20240218,Daniel,28,Engineer
20240219,Daniel,28,Engineer
20240220,Daniel,28,Engineer
20240221,Daniel,28,Engineer
20240222,Daniel,28,Engineer
20240223,Daniel,28,Engineer
20240224,Daniel,28,Engineer
20240225,Daniel,28,Engineer
20240226,Daniel,28,Engineer
20240227,Daniel,28,Engineer
20240228,Daniel,28,Engineer
20240229,Daniel,28,Engineer
20240301,Daniel,28,Engineer
20240302,Daniel,28,Engineer
20240303,Daniel,28,Engineer
20240304,Daniel,28,Engineer
20240305,Daniel,28,Engineer
20240306,Daniel,28,Engineer
20240307,Daniel,28,Engineer
20240308,Daniel,28,Engineer
20240309,Daniel,28,Engineer
20240310,Daniel,28,Engineer
20240311,Daniel,28,Engineer
20240312,Daniel,28,Engineer
20240313,Daniel,28,Engineer
20240314,Daniel,28,Engineer
20240315,Daniel,28,Engineer
20240316,Daniel,28,Engineer
20240317,Daniel,28,Photographer
更多回答
Clever! I would have mindlessly implemented the POSIX versions of mktime
and strftime
when in fact you don't even need to convert the date at all. +1
聪明的!我会轻率地实现mktime和strftime的POSIX版本,而实际上您根本不需要转换日期。+1
@Fravadona If you implemented those anyway and put them under a BSD2 license, I could include them in cppawk
, where they would be available via some new header file (which would suppress the definitions on GNU Awk).
@Fravadona如果您无论如何都要实现它们,并将它们放在BSD2许可证下,我可以将它们包含在cppawk中,在cppawk中可以通过一些新的头文件获得它们(这将抑制GNU Awk上的定义)。
@Kaz my implementation of mktime
doesn't handle the timezones, and strftime
doesn't return a formatted string but fills an array with the components of the date instead. Supporting the timezones and the format string would need a lot of code
@Kaz我的mktime实现不处理时区,strftime不返回格式化字符串,而是用日期的组成部分填充数组。支持时区和格式字符串需要大量代码
mktime
and strftime
are GNU specific, but OP's "UNIX" might have it. It's still possible to implement the same functionalities with a POSIX awk
tho
Mktime和strftime是特定于GNU的,但是OP的“Unix”可能有它。它仍然可以使用POSIX awk来实现相同的功能
@Fravadona: This works at least with gawk
, mawk
and nawk
@Fravadona:这至少适用于Gawk、Mawk和Nawk
@Fravadona: I take it back, this does not work in nawk
because of the lacking time functions. Adding a note
@Fravadona:我收回我的话,这在nawk不起作用,因为缺乏时间功能。添加便笺
CSV isn't particularly weird IMO. Per RFC4180 it's OK in CSV to have spaces at the start or end of a field as long as there aren't then double quotes inside the field with the spaces outside of the quotes. So ,<blank>Photographer
as the OP has is fine and so is ,"<blank>Photographer"
but not ,<blank>"Photographer"
.
CSV并不是特别奇怪的IMO。根据RFC4180,在CSV中,只要字段内没有双引号,引号外有空格,在字段的开始或结束处留有空格是可以的。因此,OP中的“摄影师”和“摄影师”都很好,但不是“摄影师”。
我是一名优秀的程序员,十分优秀!