gpt4 book ai didi

c - 读取日志文件并以指定格式写入其他文件

转载 作者:可可西里 更新时间:2023-11-01 10:12:11 25 4
gpt4 key购买 nike

我有一个日志文本文件 (*.txt),其中大约有 250 万个条目使用 C languaje,我必须读取它并写入具有特定格式的其他文件。

必须读取的文件如下:

202.32.92.47 - - [01/Jun/1995:00:00:59 -0600] "GET /~scottp/publish.html" 200 271 - -
ix-or7-27.ix.netcom.com RFC-1413 John Thomas [01/Jun/1995:00:02:51 -0600] "GET /~ladd/ostriches.html" 200 205908 - "Mozilla/5.0 (X11; U; Linux i686; es-ES;rv:1.7.5)"
ppp-4.pbmo.net - John Thomas [07/Dec/1995:13:20:28 -0600] "GET /dcs/courses/cai/html/introduction_lesson/index.html HTTP/1.0" 500 - "http://www.wikipedia.org/" "Mozilla/5.0 (X11; U; Linux i686; es-ES;rv:1.7.5)"
ppp-4.pbmo.net - - [07/Dec/1995:13:20:37 -0600] "GET /dcs/courses/cai/html/index.html HTTP/1.0" 500 4528 - -
lbm2.niddk.nih.gov RFC-1413 - [07/Dec/1995:13:21:03 -0600] "GET /~ladd/vet_libraries.html" 200 11337 "http://www.wikipedia.org/" -

此日志(原始)文件的每一行的格式为:IP ID NAME [DATE:TIME TIMEZONE] "METHOD DIR"STATUS MB "WEB""FROM"。因此,我将使用 || 拆分前面的日志示例以获得更好的可视化效果:

|| ix-or7-27.ix.netcom.com || RFC-1413 || John Thomas || [01/Jun/1995 || :00:02:51 || -0600] || "GET || /~ladd/ostriches.html" || 200 || 205908 || - || "Mozilla/5.0 (X11; U; Linux i686; es-ES;rv:1.7.5)" ||
|| ppp-4.pbmo.net || - || John Thomas || [07/Dec/1995 || :13:20:28 || -0600] || "GET || /dcs/courses/cai/html/introduction_lesson/index.html HTTP/1.0" || 500 || - || "http://www.wikipedia.org/" || "Mozilla/5.0 (X11; U; Linux i686; es-ES;rv:1.7.5)" ||
|| ppp-4.pbmo.net || - || - || [07/Dec/1995 || :13:20:37 || -0600] || "GET || /dcs/courses/cai/html/index.html HTTP/1.0" || 500 || 4528 || - || - ||
|| lbm2.niddk.nih.gov || RFC-1413 || - || [07/Dec/1995 || :13:21:03 || -0600] || "GET || /~ladd/vet_libraries.html" || 200 || 11337 || "http://www.wikipedia.org/" || - ||

例如,对于第一行:

IP = ix-or7-27.ix.netcom.com 
ID = RFC-1413
NAME = John Thomas
DATE = 01/Jun/1995
TIME = 00:02:51
TIMEZONE = -0600
METHOD = GET
DIR: /~ladd/ostriches.html
STATUS = 200
MB = 205908
WEB = -
FROM = Mozilla/5.0 (X11; U; Linux i686; es-ES;rv:1.7.5)

(每个字段的值可以是text-)。

预期的输出是:

ix-or7-27.ix.netcom.com | RFC-1413 | John Thomas | 01/Jun/1995 | 00:02:51 | -06 | GET | /~ladd/ostriches.html | 200 || 205908 | - | Mozilla/5.0 (X11; U; Linux i686; es-ES;rv:1.7.5)
ppp-4.pbmo.net | - | John Thomas || 07/Dec/1995 | 13:20:28 | -06 | GET | /dcs/courses/cai/html/introduction_lesson/index.html HTTP/1.0 | 500 | - | http://www.wikipedia.org/ | Mozilla/5.0 (X11; U; Linux i686; es-ES;rv:1.7.5)
ppp-4.pbmo.net | - | - || 07/Dec/1995 | 13:20:37 | -06 | GET | /dcs/courses/cai/html/index.html HTTP/1.0 | 500 || 4528 | - | -
lbm2.niddk.nih.gov | RFC-1413 || - | 07/Dec/1995 | 13:21:03 | -06 | GET | /~ladd/vet_libraries.html | 200 | 11337 | http://www.wikipedia.org/ | -

因此,格式是拆分原始行并在每个字段之间添加 |。每个字段可以是:

  • 第一个参数(IP):catch all up to space
  • 第二个参数(ID):捕获所有到空格(可以是字符串或-)
  • 第三个参数 (NAME):捕获所有直到 [(可以是带空格的字符串或 -)
  • 第四个参数(DATE):捕获所有到:
  • 第五个参数(TIME):抓到空格
  • 第六个参数(TIMEZONE):捕获所有到](-dddd必须在-dd中转换)
  • 第七个参数(METHOD):抓到空格
  • 第八个参数(DIR): catch 空间
  • 第九个参数(STATUS): catch 空间
  • 第十个参数(MB): catch 空间
  • 第十一个参数(WEB):catch all inside ""(or -)
  • 第十二个参数(FROM):catch all inside ""(或-)

知道我是怎么得到它的吗?

谢谢。


编辑 1:

我用来读/写文件的代码是:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main() {
// variables
int line[255];
char *token;

// open files
FILE *fpr = fopen("myLogFile.txt","r");
FILE *fpw = fopen("myFormattedLogFile.txt","w");

// read file
while (fgets(line, 255, fpr) != NULL) {
token = strtok(line, " ");
while (token != NULL) {
// write file
fprintf(fpw, "%s | ", token);
token = strtok(NULL, " ");
}
fprintf(fpw, "\n");
}

// close files
fclose(fpr);
fclose(fpw);

return 0;
}

但由于 John Thomas 有两个值,所以它不起作用,我不知道如何设置正确的格式(删除 [, ] , ", 改变数字格式,拆分日期和时间,控制是字符串还是-, ...)。


编辑 2:@CHUX 的解决方案

我有一个帅哥:

// 6º pattern. How can I recover it as string?
// 7º pattern. How can I remove first "?
// 8º patter. How can I remove last "?
// how could catch all inside "" ? Which pattern should I use?
// what is variable n?
// what is Invalid_Input? It appears as undeclared

您的解决方案后更新的代码是:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define LINE_LENGTH 255

// First parameter (IP): catch all up to space
#define IP_FMT "%s"
char IP[LINE_LENGTH];

// Second parameter (ID): catch all up to space (can be a string or a -)
#define ID_FMT "%s"
char ID[LINE_LENGTH];

// Third parameter (NAME): catch all up to [ (can be a string with spaces or a -)
#define NAME_FMT " %[^[]["
char NAME[LINE_LENGTH];

// Fourth parameter (DATE): catch all up to :
#define DATE_FMT " %11[^:]:"
char DATE[11+1];

// Fifth parameter (TIME): catch all up to space
#define TIME_FMT "%8s"
char TIME[8+1];

// Sixth parameter (TIMEZONE): catch all up to ] (-dddd must be converted in -dd)
#define TIMEZONE_FMT "%5d]"
int TIMEZONE;

// Seventh parameter (METHOD): catch all up to space
#define METHOD_FMT "%s"
char METHOD[LINE_LENGTH];

// Eigth parameter (DIR): catch all up to space
#define DIR_FMT "%s"
char DIR[LINE_LENGTH];

// Ninth parameter (STATUS): catch all up to space
#define STATUS_FMT "%s"
char STATUS[LINE_LENGTH];

// Tenth parameter (MB): catch all up to space
#define MB_FMT "%s"
char MB[LINE_LENGTH];

// Eleventh parameter (WEB): catch all inside "" (or -)

// Twelveth parameter (FROM): catch all inside "" (or -)



int main() {
// variables
char *line = malloc(LINE_LENGTH);
char *token;
int position = 0;

// open files
FILE *fpr = fopen("log.txt","r");
FILE *fpw = fopen("myFormattedLogFile.txt","w");

// read file
while (fgets(line, LINE_LENGTH, fpr) != NULL) {

int n = 0;

sscanf
(
line,
IP_FMT ID_FMT NAME_FMT DATE_FMT TIME_FMT TIMEZONE_FMT METHOD_FMT DIR_FMT STATUS_FMT MB_FMT " %n",
IP, ID, NAME, DATE, TIME, &TIMEZONE, METHOD, DIR, STATUS, MB, &n
);

NAME[strlen(NAME)-1] = '\0';

fprintf
(
fpw,
"%s | %s | %s | %s | %s | %d | %s | %s | %s | %s\n",
IP, ID, NAME, DATE, TIME, TIMEZONE, METHOD, DIR, STATUS, MB
);

}

// close files
fclose(fpr);
fclose(fpw);

return 0;
}

最佳答案

sscanf()"%n" 可以完成这项工作。与 NAME 一样,可能需要一些后处理。

对于如此复杂的格式,我建议使用字符串连接

// First parameter (IP): catch all up to space
#define IP_FMT "%s"
char IP[sizeof line];

// Second parameter (ID): catch all up to space (can be a string or a -)
#define ID_FMT "%s"
char ID[sizeof line];

// Third parameter (NAME): catch all up to [ (can be a string with spaces or a -)
#define NAME_FMT " %[^[]["
char NAME[sizeof line];

// Fourth parameter (DATE): catch all up to :
#define DATE_FMT " %11[^:]:"
char DATE[11+1];

// Fifth parameter (TIME): catch all up to space
#define TIME_FMT "%8s"
char TIME[8+1];

// Sixth parameter (TIMEZONE): catch all up to ] (-dddd must be converted in -dd)
#define TIMEZONE_FMT "%5d]"
int TIMEZONE;

// Other fields left for OP

int n = 0;
sscanf(s, IP_FMT ID_FMT NAME_FMT DATE_FMT TIME_FMT " %n",
ID, ID, NAME, DATE, TIME, &TIMEZONE, &n);

if (n == 0) return Invalid_Input;
trim(NAME);

关于c - 读取日志文件并以指定格式写入其他文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53838039/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com