gpt4 book ai didi

python - Pandas 中的 TextFileReader 参数

转载 作者:行者123 更新时间:2023-12-05 06:05:47 25 4
gpt4 key购买 nike

在 pandas IO 函数中,如 read_csv、read_fwf,文档说可选的关键字参数被传递给 TextFileReader。

**kwds : optional

Optional keyword arguments can be passed to TextFileReader.

然后,文档中没有任何内容说明有效参数是什么。

什么可以传递给 TextFileReader?

最佳答案

the documentation says that the optional keyword arguments are passed to TextFileReader.

从技术上讲,当您调用 pandas.io.parsers.read_csv 时, pandas.io.parsers.read_fwfpandas.io.parsers.read_table关键字参数和所有其他参数都传递给 pandas.io.parsers._read这又将它们传递给 pandas.io.parsers.TextFileReader .

正如我们在下面看到的,pandas.io.parsers.TextFileReader__init__将一些特定的 kwds 分配给各种实例变量,并将 init 方法不需要的任何内容保存在名为 self.orig_options 的实例变量中。

class TextFileReader(abc.Iterator):
"""
Passed dialect overrides any of the related parser options
"""

def __init__(self, f, engine=None, **kwds):

self.f = f

if engine is not None:
engine_specified = True
else:
engine = "python"
engine_specified = False
self.engine = engine
self._engine_specified = kwds.get("engine_specified", engine_specified)

_validate_skipfooter(kwds)

dialect = _extract_dialect(kwds)
if dialect is not None:
kwds = _merge_with_dialect_properties(dialect, kwds)

if kwds.get("header", "infer") == "infer":
kwds["header"] = 0 if kwds.get("names") is None else None

self.orig_options = kwds

# miscellanea
self._currow = 0

options = self._get_options_with_defaults(engine)
options["storage_options"] = kwds.get("storage_options", None)

self.chunksize = options.pop("chunksize", None)
self.nrows = options.pop("nrows", None)
self.squeeze = options.pop("squeeze", False)

self._check_file_or_buffer(f, engine)
self.options, self.engine = self._clean_options(options, engine)

if "has_index_names" in kwds:
self.options["has_index_names"] = kwds["has_index_names"]

self._engine = self._make_engine(self.engine)

据我所知,self.orig_options 仅在 _get_options_with_defaults 时使用方法被调用。此方法似乎对选项进行了更多验证,以确保它们适用于您告诉读者使用的任何引擎。

def _get_options_with_defaults(self, engine):
kwds = self.orig_options

options = {}

for argname, default in parser_defaults.items():
value = kwds.get(argname, default)

# see gh-12935
if argname == "mangle_dupe_cols" and not value:
raise ValueError("Setting mangle_dupe_cols=False is not supported yet")
else:
options[argname] = value

for argname, default in _c_parser_defaults.items():
if argname in kwds:
value = kwds[argname]

if engine != "c" and value != default:
if "python" in engine and argname not in _python_unsupported:
pass
elif value == _deprecated_defaults.get(argname, default):
pass
else:
raise ValueError(
f"The {repr(argname)} option is not supported with the "
f"{repr(engine)} engine"
)
else:
value = _deprecated_defaults.get(argname, default)
options[argname] = value

if engine == "python-fwf":
# pandas\io\parsers.py:907: error: Incompatible types in assignment
# (expression has type "object", variable has type "Union[int, str,
# None]") [assignment]
for argname, default in _fwf_defaults.items(): # type: ignore[assignment]
options[argname] = kwds.get(argname, default)

return options

如果 kwds 通过了所有这些验证,它们最终会出现在 self.options 中,由 _make_engine 使用方法作为要传递给解析器引擎的参数。

    def _make_engine(self, engine="c"):
mapping: Dict[str, Type[ParserBase]] = {
"c": CParserWrapper,
"python": PythonParser,
"python-fwf": FixedWidthFieldParser,
}
if engine not in mapping:
raise ValueError(
f"Unknown engine: {engine} (valid options are {mapping.keys()})"
)
# error: Too many arguments for "ParserBase"
return mapping[engine](self.f, **self.options)

现在问题是:

What can be passed to TextFileReader?

答案在很大程度上取决于您使用的引擎及其支持的参数。

关于python - Pandas 中的 TextFileReader 参数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65942903/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com