python - 是否有 Numpy 或 Pandas 设置在创建 NaN 值时发出警告-6ren

python - 是否有 Numpy 或 Pandas 设置在创建 NaN 值时发出警告

转载作者：太空宇宙更新时间：2023-11-03 11:19:42

25

4

我花了很多时间与 Pandas 打交道，它使用 numpy 数组来存储数字。

在我的用例中，永远不应该有任何 NaN 值——它们表明出现了问题(通常是与 Pandas 相关的错误，例如错误连接的数据帧、错误加载的数据等)

如果 Pandas 或 Numpy 有一个设置，如果 NaN 值出现在数据框中的任何系列中，将立即发出警告，这将很有帮助。 (这个问题不是关于 NaN 替换或插补。只是警告。)

是的，可以在每个阶段编写大量本地检查(做这件事。现在检查你是否创建了 NaN。做另一件事。再次检查你是否创建了 NaN 等)，但那是非常冗长和低效。我想告诉 pandas 的是，如果您将 NaN 值放入数据框中，立即发出警告 - 一次，作为我 jupyter notebook 顶部的全局设置。

有谁知道是否存在执行此操作的全局设置？

最佳答案

如果你只是想发出警告，你可以使用 df.isnull().values.any() 检查你的数据框是否包含任何 NaN tehn 你可以使用 warnings 模块发出警告。

这是一个工作示例:

>>> from StringIO import StringIO 
>>> import pandas as pd 
>>> st = """ 
... col1|col2
... 1|
... 2|3 
... """
>>> df = pd.read_csv(StringIO(st),sep="|") 
>>> df.head() 
   col1  col2
0     1   NaN
1     2     3
>>> import warnings                              ^
>>> if df.isnull().values.any(): 
...     warnings.warn("there is NaN")
... 
__main__:2: UserWarning: there is NaN
>>>

如果您正在寻找 pandas 中的常规设置，请根据源代码 here , DataFrame 类 为构造数据帧所做的检查不包括在存在 NaN 时发出警告的方法。因此，必须更新核心 Pandas 以添加它。这是 DataFrame 类完成的完整检查的摘录。

def __init__(self, data=None, index=None, columns=None, dtype=None,
             copy=False):
    if data is None:
        data = {}
    if dtype is not None:
        dtype = self._validate_dtype(dtype)

    if isinstance(data, DataFrame):
        data = data._data

    if isinstance(data, BlockManager):
        mgr = self._init_mgr(data, axes=dict(index=index, columns=columns),
                             dtype=dtype, copy=copy)
    elif isinstance(data, dict):
        mgr = self._init_dict(data, index, columns, dtype=dtype)
    elif isinstance(data, ma.MaskedArray):
        import numpy.ma.mrecords as mrecords
        # masked recarray
        if isinstance(data, mrecords.MaskedRecords):
            mgr = _masked_rec_array_to_mgr(data, index, columns, dtype,
                                           copy)

        # a masked array
        else:
            mask = ma.getmaskarray(data)
            if mask.any():
                data, fill_value = maybe_upcast(data, copy=True)
                data[mask] = fill_value
            else:
                data = data.copy()
            mgr = self._init_ndarray(data, index, columns, dtype=dtype,
                                     copy=copy)

    elif isinstance(data, (np.ndarray, Series, Index)):
        if data.dtype.names:
            data_columns = list(data.dtype.names)
            data = dict((k, data[k]) for k in data_columns)
            if columns is None:
                columns = data_columns
            mgr = self._init_dict(data, index, columns, dtype=dtype)
        elif getattr(data, 'name', None) is not None:
            mgr = self._init_dict({data.name: data}, index, columns,
                                  dtype=dtype)
        else:
            mgr = self._init_ndarray(data, index, columns, dtype=dtype,
                                     copy=copy)
    elif isinstance(data, (list, types.GeneratorType)):
        if isinstance(data, types.GeneratorType):
            data = list(data)
        if len(data) > 0:
            if is_list_like(data[0]) and getattr(data[0], 'ndim', 1) == 1:
                if is_named_tuple(data[0]) and columns is None:
                    columns = data[0]._fields
                arrays, columns = _to_arrays(data, columns, dtype=dtype)
                columns = _ensure_index(columns)

                # set the index
                if index is None:
                    if isinstance(data[0], Series):
                        index = _get_names_from_index(data)
                    elif isinstance(data[0], Categorical):
                        index = _default_index(len(data[0]))
                    else:
                        index = _default_index(len(data))

                mgr = _arrays_to_mgr(arrays, columns, index, columns,
                                     dtype=dtype)
            else:
                mgr = self._init_ndarray(data, index, columns, dtype=dtype,
                                         copy=copy)
        else:
            mgr = self._init_dict({}, index, columns, dtype=dtype)
    elif isinstance(data, collections.Iterator):
        raise TypeError("data argument can't be an iterator")
    else:
        try:
            arr = np.array(data, dtype=dtype, copy=copy)
        except (ValueError, TypeError) as e:
            exc = TypeError('DataFrame constructor called with '
                            'incompatible data and dtype: %s' % e)
            raise_with_traceback(exc)

        if arr.ndim == 0 and index is not None and columns is not None:
            if isinstance(data, compat.string_types) and dtype is None:
                dtype = np.object_
            if dtype is None:
                dtype, data = infer_dtype_from_scalar(data)

            values = np.empty((len(index), len(columns)), dtype=dtype)
            values.fill(data)
            mgr = self._init_ndarray(values, index, columns, dtype=dtype,
                                     copy=False)
        else:
            raise ValueError('DataFrame constructor not properly called!')

    NDFrame.__init__(self, mgr, fastpath=True)

因此，您需要提交功能请求以将其添加到 pandas。

关于python - 是否有 Numpy 或 Pandas 设置在创建 NaN 值时发出警告，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/45322912/

25

4

0

文章推荐： python - 如何使用 numpy.fromfile 检测 EOF

文章推荐： c# - 以编程方式添加站点用户 Web 部件

文章推荐： python - 如何在 Python 中将数组的数组转换为多维数组？

Python 是否
我有一个 if 语句，如下所示 if (not(fullpath.lower().endswith(".pdf")) or not (fullpath.lower().endswith(tup
php - 是否/是否有任何浏览器允许控制流构造在脚本标签中存活？
然而，在 PHP 中，可以: only appears if $foo is true. only appears if $foo is false. 在 Javascript 中，能否在一个脚
binary - 是否(曾经有过)为任意二进制格式创建模式语言的努力？
XML有很多好处。它既是机器可读的，也是人类可读的，它具有标准化的格式，并且用途广泛。它也有一些缺点。它是冗长的，不是传输大量数据的非常有效的方法。 XML最有用的方面之一是模式语言。使用模式，您可
sql-server - 是否 CTE
由于长期使用 SQL2000，我并没有真正深入了解公用表表达式。我给出的答案here (#4025380)和 here (#4018793)违背了潮流，因为他们没有使用 CTE。我很欣赏它们对于递
java - 是否 hibernate 分离对象的默认乐观锁定？
我有一个应用程序: void deleteObj(id){ MyObj obj = getObjById(id); if (obj == null) { throw n
mysql - 是否 hibernate 关闭连接？
我的代码如下。可能我以类似的方式多次使用它，即简单地说，我正在以这种方式管理 session 和事务: List users= null; try{ sess
android - 是否/是否有适用于Android的标准程序包结构/层次结构做法？
在开发J2EE Web应用程序时，我通常会按以下方式组织我的包结构 com.jameselsey.. 控制器-控制器/操作转到此处服务-事务服务类，由控制器调用域-应用程序使用的我的域类/对象 D
c++ -/是否/memmove 使用中间缓冲区？
这更多是出于好奇而不是任何重要问题，但我只是想知道 memmove 中的以下片段文档: Copying takes place as if an intermediate buffer were us
algorithm - 在联合查找算法中，是否/如何调整节点在路径压缩中的等级
路径压缩涉及将根指定为路径上每个节点的新父节点——这可能会降低根的等级，并可能降低路径上所有节点的等级。有办法解决这个问题吗？有必要处理这个吗？或者，也许可以将等级视为树高的上限而不是确切的高度？谢
C++ 是否 reinterpret_cast 总是返回结果？
我有两个类，A 和 B。A 是 B 的父类，我有一个函数接收指向 A 类型类的指针，检查它是否也是 B 类型，如果是将调用另一个函数，该函数接受一个指向类型 B 的类的指针。当函数调用另一个函数时，我
c++ - Valgrind 是否/可以使用多个处理器？
有没有办法让 valgrind 使用多个处理器？我正在使用 valgrind 的 callgrind 进行一些瓶颈分析，并注意到我的应用程序中的资源使用行为与在 valgrind/callgrind
haskell - 是否/应该将函数包装到 monad 转换器中被视为不好的做法？
假设我们要使用 ReaderT [(a,b)]超过 Maybe monad，然后我们想在列表中进行查找。现在，一个简单且不常见的方法是: 第一种可能性 find a = ReaderT (looku
jQuery 检查 attr 是否=值
我的代码似乎有问题。我需要说的是: if ( $('html').attr('lang').val() == 'fr-FR' ) { // do this } else { // do
azure - AKS 是否/是否支持跨更新域传播 Pod？
根据this文章(2018 年 4 月)AKS 在可用性集中运行时能够跨故障域智能放置 Pod，但尚不考虑更新域。很快就会使用更新域将 Pod 放入 AKS 中吗？最佳答案当您设置集群时，它已经自
php - 查询以检查同一表中的 row1 = row2 是否
course | section | type comart2 : bsit201 : lec comart2 :
android - AAR 依赖项 - 是否 bundle ？
我正在开发自己的 SDK，而这又依赖于某些第 3 方 SDK。例如 - OkHttp。我应该将 OkHttp 添加到我的 build.gradle 中，还是让我的 SDK 用户包含它？在这种情况下，
functional-programming - Rust 是否/将支持函数式编程习惯用法？
随着 Rust 越来越充实，我对它的兴趣开始激起。我喜欢它支持代数数据类型，尤其是那些匹配的事实，但是对其他功能习语有什么想法吗？例如标准库中是否有标准过滤器/映射/归约函数的集合，更重要的是，您能
html - h1 :before{ } work for seo? 是否
关闭。这个问题不符合Stack Overflow guidelines .它目前不接受答案。这个问题似乎与 help center 中定义的范围内的编程无关。 . 关闭 9 年前。 Improve
php - 是否/为什么 php 强制您使用对象构造函数
我一直在研究 PHP 中的对象。我见过的所有示例甚至在它们自己的对象上都使用了对象构造函数。 PHP 会强制您这样做吗？如果是，为什么？例如: firstname = $firstname;
php - PHP 是否(在内部)以不同方式处理数字索引数组？
...比关联数组？关联数组会占用更多内存吗？ $arr = array(1, 1, 1); $arr[10] = 1; $arr[] = 1; // <- index is 11; does the

首页

博学

6Ren·AI

商城

python - 是否有 Numpy 或 Pandas 设置在创建 NaN 值时发出警告