gpt4 book ai didi

javascript - 将大型 txt 文件转换为任何结构化格式

转载 作者:行者123 更新时间:2023-11-30 19:56:26 27 4
gpt4 key购买 nike

我有一个大的 txt 文件,其中有空格分隔的“列”,我希望能够将其转换为 JSON、xlsx、csv 等格式,以便我可以对数据执行编程操作。

文件很大,所以我不会发布整个文件 - 不过这里有一个片段来获取示例:

ID number Name                              TitlFed  Grade GamesBorn Flag
10207538 A E M, Doshtagir BAN 1864 0 i
10206612 A K M, Sourab BAN 1714 0 i
5045886 A K, Kalshyan IND 1958 0 1964
8605360 A La, Teng Hua CHN 1915 0 1993 wi
5031605 A, Akshaya IND 2016 29 1994 w
5080444 A, Sohita IND 1447 0 1995 wi
5706068 A. Nashir, Mohd Khairul Nazrin MAS 1878 0 i
10201971 A.f.m., Mahfuzul Haque BAN 1690 0
10202650 A.k. Azad, Akand BAN 1692 0 i
10210997 A.K.M. Mehfuz BAN 2015 0
24663832 Aab, Manfred GER 1808 0 1963
1701991 Aaberg, Anton SWE 2374 4 1972
1513966 Aabid, Ryaad NOR 1642 0 1958
1407589 Aabling-Thomsen, Jakob f DEN 2331 18 1985
12524670 Aadeli, Arvin IRI 2015 0
5072662 Aadhityaa, M IND 1898 10 1999
25034677 Aadish S IND 1528 5 1999
5086183 Aaditt, M K IND 1610 0 1996 i
5027942 Aaditya, Jagadeesh IND 1814 16 1998
25011952 Aadityan G IND 1621 7 2001
5063485 Aadityan, N. IND 1758 8 1996
1427024 Aagaard, Gert DEN 2030 7 1966
1401815 Aagaard, Jacob g DEN 2506 9 1973
1411802 Aagaard, Kasper DEN 1913 0 1992 i
1017942 Aagaard, Michael NED 2075 0 1960
1406248 Aage, Bjarke DEN 2068 0 1978 i
1506064 Aagedal, Geir Ole NOR 1833 7 1957
25021044 Aagney L., Narasimhan IND 1285 6 2000
10205640 Aahelee, Sarker BAN 1577 0 w
25014510 Aakanksha Hagawane IND 1622 0 2000 w
25030388 Aakash Jain IND 1577 7 1998
35004336 Aakash S B IND 1235 10 1998
5093295 Aakasha IND 1620 3 2000 w
504599 Aakio, Seppo FIN 2078 0 1954
1402315 Aalbaek, Kurt Frede Nissen DEN 1440 0 1944
1024388 Aalbers, Klaas NED 1891 0 1955 i
2252465 Aalbersberg Kroon, Pedro ESP 1878 0 1933
2218682 Aalders, Hendricus ESP 2021 0 1930 i
1033948 Aalders, Peter NED 1903 0 1964
501956 Aaltio, Erkki FIN 2118 0 1935
1504452 Aandal, Kristian NOR 2012 0 1985 i

我用 javascript 编程,所以理想情况下我想将其转换为 JSON,理想情况下每个玩家/id 都在他们自己的对象中,如下所示:

    var AllPlayers =
[{
"2434324243":
{
"name":"some guy",
"title":"f",
"fed":"USA",
"grade":"1999",
"games":"3",
"born":"1990"

},
"8787878887":
{
"name":"anyone",
"title":"",
"fed":"BER",
"grade":"2222",
"games":"6",
"born":"1970"

}
}
]

我试过使用 Node 中的 fs 模块来读取 txt 文件,然后我计算了每一行的长度(71 个字符)并尝试将其推送到一个数组 - 然而似乎在读取时空格被消除了该文件使它成为一种不可行的方法,因为每个人的信息都具有可变长度。

  var fs = require('fs');
var allPlayers=[];
thisPlayer='';
//1st row length =74
//other rows 71
//14895 rows
fs.readFile('jul12frl.txt', 'utf8', function(err, contents) {
for(let x=74;x<14895;x++){
thisPlayer+=contents[x];
if(thisPlayer.length==71){
allPlayers.push(thisPlayer);
thisPlayer='';
}
}
});

我还尝试使用 Excel 内置向导将 txt 转换为 excel 格式 - 但它没有选择所有需要的列 - 它将 Name/title/fed/grade 列合并为一个大列。

最佳答案

const data = `10207538  A E M, Doshtagir                      BAN  1864    0        i
10206612 A K M, Sourab BAN 1714 0 i
5045886 A K, Kalshyan IND 1958 0 1964
8605360 A La, Teng Hua CHN 1915 0 1993 wi
5031605 A, Akshaya IND 2016 29 1994 w
5080444 A, Sohita IND 1447 0 1995 wi
5706068 A. Nashir, Mohd Khairul Nazrin MAS 1878 0 i
10201971 A.f.m., Mahfuzul Haque BAN 1690 0
10202650 A.k. Azad, Akand BAN 1692 0 i
10210997 A.K.M. Mehfuz BAN 2015 0
24663832 Aab, Manfred GER 1808 0 1963
1701991 Aaberg, Anton SWE 2374 4 1972
1513966 Aabid, Ryaad NOR 1642 0 1958
1407589 Aabling-Thomsen, Jakob f DEN 2331 18 1985
12524670 Aadeli, Arvin IRI 2015 0
5072662 Aadhityaa, M IND 1898 10 1999
25034677 Aadish S IND 1528 5 1999
5086183 Aaditt, M K IND 1610 0 1996 i
5027942 Aaditya, Jagadeesh IND 1814 16 1998
25011952 Aadityan G IND 1621 7 2001
5063485 Aadityan, N. IND 1758 8 1996
1427024 Aagaard, Gert DEN 2030 7 1966
1401815 Aagaard, Jacob g DEN 2506 9 1973
1411802 Aagaard, Kasper DEN 1913 0 1992 i
1017942 Aagaard, Michael NED 2075 0 1960
1406248 Aage, Bjarke DEN 2068 0 1978 i
1506064 Aagedal, Geir Ole NOR 1833 7 1957
25021044 Aagney L., Narasimhan IND 1285 6 2000
10205640 Aahelee, Sarker BAN 1577 0 w
25014510 Aakanksha Hagawane IND 1622 0 2000 w
25030388 Aakash Jain IND 1577 7 1998
35004336 Aakash S B IND 1235 10 1998
5093295 Aakasha IND 1620 3 2000 w
504599 Aakio, Seppo FIN 2078 0 1954
1402315 Aalbaek, Kurt Frede Nissen DEN 1440 0 1944
1024388 Aalbers, Klaas NED 1891 0 1955 i
2252465 Aalbersberg Kroon, Pedro ESP 1878 0 1933
2218682 Aalders, Hendricus ESP 2021 0 1930 i
1033948 Aalders, Peter NED 1903 0 1964
501956 Aaltio, Erkki FIN 2118 0 1935
1504452 Aandal, Kristian NOR 2012 0 1985 i`;


const rows = data.split("\n");
function parseRow(row) {
const id = row.slice(0, 10).trim();
const name = row.slice(10, 44).trim();
const title = row.slice(44, 48).trim();
const country = row.slice(48, 53).trim();
const grade = row.slice(53, 60).trim();
const games = row.slice(60, 64).trim();
const born = row.slice(64, 70).trim();
const flag = row.slice(70, 72).trim();

return {
id,
name,
title,
country,
grade: grade && parseInt(grade),
games: games && parseInt(games, 10),
born : born && parseInt(born, 10),
flag
}
}

const parsedRows = rows.reduce((acc, row) => {
const parsed = parseRow(row);
acc[parsed.id] = parsed;
return acc;
}, {});

console.log(parsedRows);

鉴于行和列的长度都与示例中提供的相同,您可以这样解析它:

// Split original string into rows as an array of strings
const rows = data.split("\n"); // could be replaced with contents read from file

function parseRow(row) {
// Parse the values by extracting it from the row by start and end index of the column
const id = row.slice(0, 10).trim();
const name = row.slice(10, 44).trim();
const title = row.slice(44, 48).trim();
const country = row.slice(48, 53).trim();
const grade = row.slice(53, 60).trim();
const games = row.slice(60, 64).trim();
const born = row.slice(64, 70).trim();
const flag = row.slice(70, 72).trim();

return {
id,
name,
title,
country,
// Parse numbers
grade: grade && parseInt(grade, 10),
games: games && parseInt(games, 10),
born : born && parseInt(born, 10),
flag
}
}

const parsed = rows.reduce((acc, row) => {
const parsed = parseRow(row);
acc[parsed.id] = parsed;
return acc;
}, {});

这是一个粗略的解决方案,但它似乎可以解决您的问题。它确实运行了您提供的示例数据。如果完整数据集与您的示例数据不同,那么您可能需要更新各个列的开始和结束索引。

但是,在您提供的示例数据中,列只是用空格分隔。如果实际数据集是制表符分隔的,那么这样的解决方案将更易于使用。 [id, name, title, country, grade, games, born, flag] = row.split('\t')

关于javascript - 将大型 txt 文件转换为任何结构化格式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53937233/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com