C++ boost date_input_facet 似乎使用传递给 facet 构造函数的不正确格式意外地解析日期-6ren

C++ boost date_input_facet 似乎使用传递给 facet 构造函数的不正确格式意外地解析日期

转载作者：行者123 更新时间：2023-11-30 01:39:07

25

4

我用于测试的 coliru 中的玩具代码: http://coliru.stacked-crooked.com/a/4039865d8d4dad52

在长期中断 C++ 之后，我又开始习惯它了。我正在编写解析 CSV 的代码，该 CSV 可能包含多个包含日期或空值的列。我的假设是每个日期列都有一种有效的日期格式，尽管不同的列可能有不同的格式。

对于我拥有的每个日期列，我找到第一个被成功解析为日期的值，给定一个带有 boost date_input_facet 对象的潜在语言环境的 std::vector。正确解析的第一个日期将返回我的工作区域设置数组中的索引。一旦我为第一个可解析日期设置了合适的格式，我就想永远修复该格式，这样我就不必再浪费 CPU 时间来检测格式。

这是我的语言环境数组:

const std::vector<std::locale> Date::date_formats = {
    std::locale(std::locale::classic(), new date_input_facet("%Y-%m-%d")),
    std::locale(std::locale::classic(), new date_input_facet("%Y/%m/%d")),
    std::locale(std::locale::classic(), new date_input_facet("%m-%d-%Y")),
    std::locale(std::locale::classic(), new date_input_facet("%m/%d/%Y")),
    std::locale(std::locale::classic(), new date_input_facet("%d-%b-%Y")),
    std::locale(std::locale::classic(), new date_input_facet("%Y%m%d")),
};

我使用从 20170101 到 20170131 的日期字符串数组来对此进行测试。然后我打印出原始日期字符串、已解析的日期以及用于解析的 date_formats vector 的索引。

对于 20170101 到 201700129，它表示第 0 个索引有效，它应该具有带破折号的“%Y-%m-%d”格式？!？!此外，破折号所在的地方，我有数字，所以它被读取为 20170101 为 2017-10- 然后删除最后一个破折号并将其解释为 2017 年 10 月，没有日期的是 2017 年 10 月 1 日。为什么它不是那样做它应该使用的格式？

从我的 coliru 中可以看到的一些结果(pY 是解析年份等):

YYYYMMDD    pY     pM   pD  format_index
20170101    2017    Oct 1   0
20170102    2017    Oct 1   0
20170103    2017    Oct 1   0
20170104    2017    Oct 1   0
20170105    2017    Oct 1   0

对于 20170130、20170131，报告了“%Y%m%d”的正确格式索引(第 5 个)。

有什么想法吗？我只想使用我传递的精确格式字符串。

最佳答案

我自己制作了一个支持多种格式的日期时间解析器。我也发现很难/不可能使用标准库和 boost 中的工具来严格解析。

我最终使用了 strptime - 主要是¹。

`adaptive_parser`

旨在按支持的格式列表播种偏爱。默认情况下，解析器不是自适应的(模式是 fixed)。

在自适应模式下，格式可能需要

sticky(始终重复使用第一个匹配的格式)
ban_failed(从列表中删除失败的模式；禁止仅发生成功解析以避免禁止无效输入的所有模式)
mru(保留列表但重新排序以 boost 性能)

Caution:
If formats are ambiguous (e.g. mm-dd-yyyy vs dd-mm-yyyy) allowing re-ordering results in unpredictable results.

⇒ Only use mru when there are no ambiguous formats

NOTE:
The function object is stateful. In algorithms, pass it by reference (std::ref(obj)) to avoid copying the patterns and to ensure correct adaptive behaviour

演示

我在您的测试数据上尝试了解析器:

#include "adaptive_parser.h"
#include <boost/date_time/gregorian/greg_date.hpp>
#include <boost/date_time/posix_time/posix_time.hpp>
#include <iostream>
#include <sstream>
#include <string>
#include <vector>

class Date{
public:
    Date() : y(0), m(0), d(0) {}
    Date(int yy, int mm, int dd) : y(yy), m(mm), d(dd) {}
    Date(boost::gregorian::date dt) : y(dt.year()), m(dt.month()), d(dt.day()) {}
    Date(std::string const& delimitedString);

    std::string to_string() const;

    int getYear()  const { return y; }
    int getMonth() const { return m; }
    int getDay()   const { return d; }
 private:
    using parser_t = mylib::datetime::adaptive_parser;
    parser_t parser { parser_t::full_match, 
        {
            "%Y-%m-%d", "%Y/%m/%d",
            "%m-%d-%Y", "%m/%d/%Y",
            "%d-%b-%Y",
            "%Y%m%d",
        } };

    int y, m, d;
};

Date::Date(const std::string& delimitedString)
{
    using namespace boost::posix_time;

    auto t = ptime({1970,1,1}) + seconds(parser(delimitedString).count());

    *this = Date(t.date());
}

std::string Date::to_string() const
{
    std::ostringstream os;

    os << std::setfill('0')
       << std::setw(4) << y 
       << std::setw(2) << m 
       << std::setw(2) << d;

    return os.str();
}

int main() {
    std::vector<Date> vec(31);
    std::generate(vec.begin(), vec.end(), [i=1]() mutable { return Date(2017,1,i++); });

    std::vector<std::string> strvec;
    std::transform(vec.begin(), vec.end(), back_inserter(strvec), std::mem_fn(&Date::to_string));

    std::cout << "YYYYMMDD\tpY\tpM\tpD\tformat_index\n";

    for (auto& str : strvec) {
        Date parsed(str);

        std::cout << str 
            << "\t" << parsed.getYear()
            << "\t" << parsed.getMonth()
            << "\t" << parsed.getDay()
            << "\t" << "?"
            << "\n";
    }
}

打印:

YYYYMMDD    pY  pM  pD  format_index
20170101    2017    1   1   ?
20170102    2017    1   2   ?
20170103    2017    1   3   ?
20170104    2017    1   4   ?
20170105    2017    1   5   ?
20170106    2017    1   6   ?
20170107    2017    1   7   ?
20170108    2017    1   8   ?
20170109    2017    1   9   ?
20170110    2017    1   10  ?
20170111    2017    1   11  ?
20170112    2017    1   12  ?
20170113    2017    1   13  ?
20170114    2017    1   14  ?
20170115    2017    1   15  ?
20170116    2017    1   16  ?
20170117    2017    1   17  ?
20170118    2017    1   18  ?
20170119    2017    1   19  ?
20170120    2017    1   20  ?
20170121    2017    1   21  ?
20170122    2017    1   22  ?
20170123    2017    1   23  ?
20170124    2017    1   24  ?
20170125    2017    1   25  ?
20170126    2017    1   26  ?
20170127    2017    1   27  ?
20170128    2017    1   28  ?
20170129    2017    1   29  ?
20170130    2017    1   30  ?
20170131    2017    1   31  ?

¹ 主要是时区方面需要调整

关于C++ boost date_input_facet 似乎使用传递给 facet 构造函数的不正确格式意外地解析日期，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/46474237/

25

4

0

文章推荐： java - 如何制作我的 jar 的可分发批处理文件

文章推荐： java - 相应地锁定和更新文件

文章推荐： c++ - 成员函数回调的绑定(bind)或 Lambda c++14

date - Lucene/Solr : facet an already faceted date field (pivot facet/tree facet/sub-facet/hierarchical facets)
我完全迷失在链接和 Solr 术语的世界中。我目前有一个日期字段，但如果可能的话，我想“进一步”面对它。一个例子: 字段:日期领域:语言所以如果我运行这个查询: http://host:port
facet - Algolia facet 按空/空值过滤
有没有办法通过属性的值为 null 或空字符串来过滤命中？即，向我展示所有没有作者的对象 facetFilters=作者:空 facetFilters=作者:'' 或者将其包含在 OR 值列表中？
c++ - 什么时候应该使用派生类 facet 代替基类 facet？
C++ 标准库中有一些标准基类方面，其默认行为依赖于经典的“C”语言环境 (std::locale::classic())。如果您的程序需要特定于文化的功能，那么切换到派生类方面(又名 byname
r - 将空图添加到 facet，并与另一个 facet 结合
使用这个 SO solution我创建了一个包含两个“空”图的 facet，目的是与另一组 facet_wrap 图组合，如下所示。目的是为不同的单位测量设置两个 y 轴标签。如何使网格布局看起来像顶
SOLR - 查询 Facets，每个 Facet 返回 N 个结果
我在 SOLR 索引中存储了大量文档。我想执行一个查询，返回指定字段的 Facet 计数，并返回每个 Facet 字段的前 100 个文档。例如。假设我的 SOLR 索引中存储了一堆书。 { na
solr4 - Solr CollapsingQParserPlugin with group.facet=on style facet counts
我使用 Solr 4.7.0 有一个大约 500 万个文档的 Solr 索引，大小为 8GB。我需要在 Solr 中分组，但发现它太慢了。下面是组配置: group=on group.facet=on
elasticsearch - ES : histogram facet with histogram facet with all_terms=true
Elasticsearch Histogramfacet似乎不支持 all_terms = true(即:即使 count=0 也返回 facetvalue/bucket) 这是正确的吗？最佳答案
mongodb - 为什么 mongo 不允许在另一个 $facet 中使用 $facet 阶段？
有$facet自 3.4 以来 mongo 中的聚合阶段 -这个很酷。它允许在同一个输入文档集的单个阶段内处理多个聚合管道。但它不允许在另一个 $facet 中使用一个 $facet。引用:“任何其
date - 在 solr 中使用 facet.date 和 stats.facet
我在 solr 中使用 Stats 组件来获取分面统计数据，效果很好，现在我有兴趣对我的日期字段执行相同的操作。但是在统计模块中使用 facet.date 字段似乎不起作用，有没有办法让它工作？我的
r - ggplot : align multiple faceted plots - facets all different sizes
我正在尝试将多个图与方面对齐。我的问题有点小但很烦人:我可以制作一个绘图，以便绘图区域对齐并且刻面本身对齐，但是刻面条的宽度并不完全相同。如果刻面的标签长度不同，则刻面条的大小将调整为使文本适合刻
r - ggplot : Order bars in faceted bar chart per facet
我在 R 中有一个数据框，我想在分面 ggplot 条形图中绘制它。我在 ggplot 中使用此代码: ggplot(data_long, aes(x = partei, y = wert, fil
eclipse - 无法创建 JPA Facet : "project facet jpt.jpa could not be found"
我在 Eclipse Java EE IDE 中有一个面向 Web 开发人员的 Maven 项目。但是当我启用 JPA 方面时，我无法选择 JPA(没有 JPA 选项)。我是否忘记包含一些依赖项？我
r - 如何更改 ggplot 中的 facet 标签的顺序(自定义 facet wrap 标签)
我使用 ggplot 绘制了一个分面图这是情节我的问题是，刻面(标签)按字母顺序排序(例如:E1、E10、E11、E13、E2、E3、I1、I10、I2)但我需要它们是像 E1、I1、E2 这样的自
scala - 为什么 IDEA 14 报告 "Cannot load facet "Scala": Unknown type of facet "scala""?
我正在尝试在 Intellij IDEA 14 中运行播放框架应用程序。我安装了 Scala 插件并需要所有 jar 文件。但是在 Project-Structure -> Facet 中，我在添
facet-wrap - ggplot facet_wrap : At least one layer must contain all variables used for facetting
我正在使用命令 qplot(factor(ww), WeeklyYield, geom = "bar", fill = I("grey50"))+facet_wrap(~model+name) 为 m
javascript - Google 可视化库 Facets : How to save python module "facets" output html to local drive offline?
最近我发现了 python 可视化库“Facets”，想知道我是否可以离线生成 html 输出。我正在使用 chrome 浏览器和 webcomponents-lite.js不需要。另外，我在我的
jsf - Tag Library supports namespace: http://java. sun.com/jsf/html，但没有为名称定义标签:facet
我正在测试可用在 https://www.primefaces.org/showcase/ui/overlay/dialog/loginDemo.xhtml 的 PrimeFaces 示例.我在 Ec
facet - 在solr响应的过滤查询中传递逗号分隔的值
我想在solr响应的过滤查询（fq）中传递逗号分隔的值，当前，当我想传递多个类别时，我使用OR运算符。像这样fq = categoryId：3 OR categoryId：55 OR categor
Elasticsearch facet 以空格标记的形式出现
我有以下 Elasticsearch 映射 { "mappings": { "hotel": { 'properties': {"name": {
java - Facet 字段值使用字谜多次返回相同的值
我正在尝试从 solr 获取字段的唯一值。我已经使用facet来获取字段值。我的方面查询参数看起来像 - SolrQuery query = new SolrQuery();

首页

博学

6Ren·AI

商城

C++ boost date_input_facet 似乎使用传递给 facet 构造函数的不正确格式意外地解析日期

`adaptive_parser`

演示