gpt4 book ai didi

python - 使用 Pandas 从文本文件中提取标题数据

转载 作者:太空宇宙 更新时间:2023-11-03 15:50:38 26 4
gpt4 key购买 nike

我之前问过一个如何使用pandas输入这个.txt文件的问题。我正在尝试使用 pandas.read_csv

我发现,除非删除标题数据(一直到“#”),否则无法使用 read_csv 读取此文件。

问题是,我需要从 header 数据中提取数据,例如井名称、井 KB、井类型...。有没有办法使用 Pandas 做到这一点?或者我只需要以其他方式输入它?

我原来的问题在这里:

Pandas.read_csv error tokenizing data

原始文本文件:

# WELL TRACE FROM PETREL 
# WELL NAME: ZZ-0113
# WELL HEAD X-COORDINATE: 9999999.00000000 (m)
# WELL HEAD Y-COORDINATE: 9999999.00000000 (m)
# WELL KB: 159.00000000 (ft)
# WELL TYPE: OIL
# MD AND TVD ARE REFERENCED (=0) AT KB AND INCREASE DOWNWARDS
# ANGLES ARE GIVEN IN DEGREES
# XYZ TRACE IS GIVEN IN COORDINATE SYSTEM WGS_1924_UTM_Zone_42N
# AZIMUTH REFERENCE TRUE NORTH
# DX DY ARE GIVEN IN GRID NORTH IN m-UNITS
# DEPTH (Z, TVD) GIVEN IN ft-UNITS
#======================================================================================================================================
MD X Y Z TVD DX DY AZIM INCL DLS
#======================================================================================================================================
0.0000000000 999999.00000 9999999.0000 159.00000000 0.0000000000 0.0000005192 -0.000000000 1.3487006929 0.0000000000 0.0000000000
132.00000000 999999.08032 9999999.9116 27.000774702 131.99922530 0.0803153923 -0.088388779 139.08870069 0.3400000000 0.2575757504
221.00000000 999999.19115 9999999.8017 -61.99775149 220.99775149 0.1911487882 -0.198290891 132.93870069 0.3200000000 0.0456726104

最佳答案

您可以使用注释指示符作为分隔符来解析文件,然后使用 pandas str.extract

from io import StringIO
import pandas as pd

txt = """# WELL TRACE FROM PETREL
# WELL NAME: ZZ-0113
# WELL HEAD X-COORDINATE: 9999999.00000000 (m)
# WELL HEAD Y-COORDINATE: 9999999.00000000 (m)
# WELL KB: 159.00000000 (ft)
# WELL TYPE: OIL
# MD AND TVD ARE REFERENCED (=0) AT KB AND INCREASE DOWNWARDS
# ANGLES ARE GIVEN IN DEGREES
# XYZ TRACE IS GIVEN IN COORDINATE SYSTEM WGS_1924_UTM_Zone_42N
# AZIMUTH REFERENCE TRUE NORTH
# DX DY ARE GIVEN IN GRID NORTH IN m-UNITS
# DEPTH (Z, TVD) GIVEN IN ft-UNITS
#======================================================================================================================================
MD X Y Z TVD DX DY AZIM INCL DLS
#======================================================================================================================================
0.0000000000 999999.00000 9999999.0000 159.00000000 0.0000000000 0.0000005192 -0.000000000 1.3487006929 0.0000000000 0.0000000000
132.00000000 999999.08032 9999999.9116 27.000774702 131.99922530 0.0803153923 -0.088388779 139.08870069 0.3400000000 0.2575757504
221.00000000 999999.19115 9999999.8017 -61.99775149 220.99775149 0.1911487882 -0.198290891 132.93870069 0.3200000000 0.0456726104"""

header_parse = pd.read_csv(StringIO(txt), sep='#', skipinitialspace=True, header=None)
hd = header_parse.iloc[:, 1].dropna()

hd.str.extract('\s*(?P<key>[^:]+)\s*:\s*(?P<value>.+)', expand=True).dropna()

key value
1 WELL NAME ZZ-0113
2 WELL HEAD X-COORDINATE 9999999.00000000 (m)
3 WELL HEAD Y-COORDINATE 9999999.00000000 (m)
4 WELL KB 159.00000000 (ft)
5 WELL TYPE OIL
<小时/>

获取其余数据

df = pd.read_csv(StringIO(txt), comment='#', delim_whitespace=True)
df

enter image description here

关于python - 使用 Pandas 从文本文件中提取标题数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41329217/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com