gpt4 book ai didi

python - USPS Package Track API 不返回 TrackSummary 的 XML 子元素

转载 作者:行者123 更新时间:2023-12-05 05:29:38 26 4
gpt4 key购买 nike

临时解决办法见文末

摘要(为澄清起见于 12/24/22 添加):

USPS 的跟踪 API 未返回与其文档格式相同的响应。由于没有 EventDate XML 元素,实际格式使得提取事件日期变得困难。最坏的情况是,我可以使用正则表达式,但想知道是否有一种方法可以接收 USPS 文档中显示的 API 响应。

详情

USPS 的 Track and Confirm API documentation第 19 页,示例响应显示 <TrackSummary>带有子元素( <EventTime>, <EventDate> 等):

Screenshot of USPS's sample response

以下是 USPS 的文本回复示例:

<TrackResponse>
<TrackInfo ID=" XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX ">
<GuaranteedDeliveryDate>June 24, 2022</GuaranteedDeliveryDate>
<TrackSummary>
<EventTime>9:00 am</EventTime>
<EventDate>June 22, 2022</EventDate>
<Event>Delivered, To Agent</Event>
<EventCity>AMARILLO</EventCity>
<EventState>TX</EventState>
<EventZIPCode>79109</EventZIPCode>
<EventCountry/>
<FirmName/>
<Name>RXXXXXX XXXXXXX</Name>
<AuthorizedAgent>false</AuthorizedAgent>
<DeliveryAttributeCode>23</DeliveryAttributeCode>
<GMT>14:00:00</GMT>
<GMTOffset>-05:00</GMTOffset>
</TrackSummary>

但是,在执行调用时,实际的 XML 响应在 TrackSummary 中缺少这些子元素:

<?xml version="1.0" encoding="UTF-8"?>
<TrackResponse>
<TrackInfo ID="9405511206213782679396">
<TrackSummary>Your item departed our WEST PALM BEACH FL DISTRIBUTION CENTER destination facility on December 23, 2022 at 12:40 pm. The item is currently in transit to the destination.</TrackSummary>
<TrackDetail>Arrived at USPS Regional Facility, December 23, 2022, 4:49 am, WEST PALM BEACH FL DISTRIBUTION CENTER</TrackDetail>
<TrackDetail>In Transit to Next Facility, 12/22/2022, 9:41 pm</TrackDetail>
<TrackDetail>In Transit to Next Facility, 12/22/2022, 1:36 pm</TrackDetail>
<TrackDetail>Departed USPS Facility, 12/22/2022, 5:58 am, HARRISBURG, PA 17112</TrackDetail>
<TrackDetail>Arrived at USPS Regional Origin Facility, 12/21/2022, 10:12 pm, HARRISBURG PA PACKAGE SORTING CENTER</TrackDetail>
<TrackDetail>Departed Post Office, December 21, 2022, 4:34 pm, DALLASTOWN, PA 17313</TrackDetail>
<TrackDetail>USPS picked up item, December 21, 2022, 2:37 pm, DALLASTOWN, PA 17313</TrackDetail>
<TrackDetail>Shipping Label Created, USPS Awaiting Item, December 21, 2022, 2:16 pm, DALLASTOWN, PA 17313</TrackDetail>
</TrackInfo>
</TrackResponse>

这可以用 Lob's USPS Postman workspace 复制

我要解决的问题是从 TrackSummary 数据中获取日期,现在需要正则表达式,因为 USPS 的 API 不返回 EventDate 子元素。

请求返回这些有用的 XML 子元素时是否有选项?我在文档中找不到,我看到的示例响应都包含这些子元素。

我已经尝试使用 Python 和 Lob 的 USPS 工作区形成请求,并且两个 XML 响应都缺少 TrackSummary 子元素。

长期解决方案(22 年 12 月 26 日进行中)

@Parfait 指出我应该使用 Package Tracking “Fields” API 而不是 Package Track API。

这是我目前使用 Package Track API 形成 XML 请求的方式:

from lxml import etree

def generate_url_tracking(tracking_numbers: list[str]) -> str:
"""generate the USPS tracking request url
:param: tracking_numbers - list of strings of tracking numbers
:return url: str tracking url for calling the USPS API
"""
xml = generate_xml_tracking(tracking_numbers)
url = f"{base_url}{url_vars['track']}{xml}"
return url

def generate_xml_tracking(tracking_numbers: list[str]) -> str:
"""
Generate USPS track and confirm API xml
:param tracking_numbers: list of strings of tracking numbers
:return: xml string
"""
xml = etree.Element("TrackRequest", {"USERID": config("USPS_USER")})
# loop through tracking numbers
for tracking in tracking_numbers:
etree.SubElement(xml, "TrackID", {"ID": tracking})
xml_string = etree.tostring(xml, encoding="utf8", method="xml").decode()
return xml_string

我会在有时间时将其更新为包裹跟踪“字段”API 请求。

临时解决方案 (12/25/22)

在 USPS 的实际响应与其 API 文档匹配之前,此解决方案从 <TrackSummary> 中提取最后更新日期针对几种不同的状态(发货前、已交付、RTS 等)

TRACK_SUMMARIES 字典具有不同的测试状态。一些没有日期的状态(no_info、out_for_delivery_no_date)返回 None。

import re
from dateutil.parser import ParserError, parse

TRACK_SUMMARIES = {
"delivered": """Your
item was delivered in or at the mailbox at 10:23 am on December 24, 2022 in HOBE SOUND, FL 33455.""",
"out_for_delivery": "Out for Delivery, December 13, 2021, 6:10 am, ARLINGTON, VA 22204.",
"out_for_delivery_no_date": "Out for Delivery, Expected Delivery Between 9:45am and 1:45pm",
"arrived_at_post_office": """Arrived at Post Office,
Arrived at USPS Regional Origin Facility, December 11, 2021, 9:23 pm, HARRISBURG PA PACKAGE SORTING CENTER""",
"acceptance": "Acceptance, December 10, 2021, 12:54 pm, DALLASTOWN, PA 17313",
"pre_shipment": "Pre-Shipment Info Sent to USPS, USPS Awaiting Item, December 27, 2021",
"rts": """Your item was returned to the sender on January 31, 2022 at 9:14 am in YORK, PA 17402
because of an incorrect address.""",
"no_info": "The Postal Service could not locate the tracking information for your request",
"label_prepared": "A shipping label has been prepared for your item at 10:47 am on December 16, 2021 in WINSTON",
"forwarded": """Your item was forwarded to a different address at 5:13 pm on January 4, 2022
in REDDING, CA. This was because of forwarding instructions or because the
address or ZIP Code on the label was incorrect.
""",
}

def get_last_updated(track_summary: str) -> Optional[datetime]:
"""Takes the USPS TrackSummary string and return the last updated datetime"""
# remove the zip code since it interferes with the date parser
track_summary = re.sub(r"\d{5}", "", track_summary)
months_regex = "January|February|March|April|May|June|July|August|September|October|November|December"
first_result = re.search(rf"(?={months_regex}).*", track_summary)
# return early if there's no Month
if not first_result:
return
first_result = first_result.group()
# some summaries have am/pm and some don't
result_for_parser = re.search(r".*(?<=am|pm)", first_result)
if result_for_parser:
result_for_parser = result_for_parser.group()
else:
result_for_parser = first_result
try:
# fuzzy parsing is required for dates in certain summaries
result = parse(result_for_parser, fuzzy=True)
except ParserError:
return
return result

来源:

Using the dateutil parser Regex for finding months

最佳答案

xml.etree.ElementTree通过 XPath

找到 child 是件好事

它为在树中定位元素的 XPath 表达式提供了有限的支持。但它足以找到 TrackSummary 数据

找到顶级的“TrackSummary” child

root.find(".//TrackSummary").text ->
Your item departed our WEST PALM BEACH FL DISTRIBUTION CENTER destination facility on December 23, 2022 at 12:40 pm. The item is currently in transit to the destination.

这个 python 演示

import xml.etree.ElementTree as ET
import datetime

document = """\
<?xml version="1.0" encoding="UTF-8"?>
<TrackResponse>
<TrackInfo ID="9405511206213782679396">
<TrackSummary>Your item departed our WEST PALM BEACH FL DISTRIBUTION CENTER destination facility on December 23, 2022 at 12:40 pm. The item is currently in transit to the destination.</TrackSummary>
<TrackDetail>Arrived at USPS Regional Facility, December 23, 2022, 4:49 am, WEST PALM BEACH FL DISTRIBUTION CENTER</TrackDetail>
<TrackDetail>In Transit to Next Facility, 12/22/2022, 9:41 pm</TrackDetail>
<TrackDetail>In Transit to Next Facility, 12/22/2022, 1:36 pm</TrackDetail>
<TrackDetail>Departed USPS Facility, 12/22/2022, 5:58 am, HARRISBURG, PA 17112</TrackDetail>
<TrackDetail>Arrived at USPS Regional Origin Facility, 12/21/2022, 10:12 pm, HARRISBURG PA PACKAGE SORTING CENTER</TrackDetail>
<TrackDetail>Departed Post Office, December 21, 2022, 4:34 pm, DALLASTOWN, PA 17313</TrackDetail>
<TrackDetail>USPS picked up item, December 21, 2022, 2:37 pm, DALLASTOWN, PA 17313</TrackDetail>
<TrackDetail>Shipping Label Created, USPS Awaiting Item, December 21, 2022, 2:16 pm, DALLASTOWN, PA 17313</TrackDetail>
</TrackInfo>
</TrackResponse>
"""

def find_between( s, first, last ):
try:
start = s.index( first ) + len( first )
end = s.index( last, start )
return s[start:end]
except ValueError:
return ""

root = ET.fromstring(document)

date_time_obj = datetime.datetime.strptime(find_between(root.find(".//TrackSummary").text,' on ', '.'), '%B %d' + ", " + '%Y at %I:%M %p')
print('Date:', date_time_obj.date())
print('Time:', date_time_obj.time())
print('Date-time:', date_time_obj)

结果

$ python track-summary.py
Date: 2022-12-23
Time: 12:40:00
Date-time: 2022-12-23 12:40:00

更新了 Reg 表达式解析

基于您针对临时解决方案 (12/25/22) 的更新问题我用 import re library 添加了解析部分。

代码

import re
import numpy as np
from datetime import date, time, datetime

def get_date(date_string):
months = np.array(['January','February','March','April','May','June','July','August','September','October','November','December'])
pattern = re.compile(r'(January|February|March|April|May|June|July|August|September|October|November|December)\s(\d{2}|\d{1})\,\s(\d{4})')
match = re.search(pattern, date_string)
if not match:
d = None
else:
month_data = match.groups()[0]
month = np.where(months==month_data)[0][0] + 1
day = int(match.groups()[1])
year = int(match.groups()[2])
try:
d = date(year, month, day)
except ValueError:
d = None # or handle error in a different way
return d

def get_hour_min(hour, min, am_pm):
hour = int(hour)
min = int(min)
add_hour = 0
if (am_pm == 'pm'):
if (hour != 12):
add_hour = 12
return [hour+add_hour, min]

def get_time(date_string):
pattern = re.compile(r'(\d{2}|\d{1})\:(\d{2})\s*(am|pm)')
matches = re.findall(pattern, date_string)
if (len(matches) == 2):
hour, min = get_hour_min(matches[0][0], matches[0][1], matches[0][2])
start_t = time(hour, min, 0)
hour, min = get_hour_min(matches[1][0], matches[1][1], matches[1][2])
end_t = time(hour, min, 0)
return [start_t, end_t]

match = re.search(pattern, date_string)
if not match:
t = None
else:
hour, min = get_hour_min(match.groups()[0], match.groups()[1], match.groups()[2])
try:
t = time(hour, min, 0)
except ValueError:
t = None # or handle error in a different way
return [t, None]

TRACK_SUMMARIES = {
"delivered": """Your
item was delivered in or at the mailbox at 10:23 am on December 24, 2022 in HOBE SOUND, FL 33455.""",
"out_for_delivery": "Out for Delivery, December 13, 2021, 6:10 am, ARLINGTON, VA 22204.",
"out_for_delivery_no_date": "Out for Delivery, Expected Delivery Between 9:45am and 1:45pm",
"arrived_at_post_office": """Arrived at Post Office,
Arrived at USPS Regional Origin Facility, December 11, 2021, 9:23 pm, HARRISBURG PA PACKAGE SORTING CENTER""",
"acceptance": "Acceptance, December 10, 2021, 12:54 pm, DALLASTOWN, PA 17313",
"pre_shipment": "Pre-Shipment Info Sent to USPS, USPS Awaiting Item, December 27, 2021",
"rts": """Your item was returned to the sender on January 31, 2022 at 9:14 am in YORK, PA 17402
because of an incorrect address.""",
"no_info": "The Postal Service could not locate the tracking information for your request",
"label_prepared": "A shipping label has been prepared for your item at 10:47 am on December 16, 2021 in WINSTON",
"forwarded": """Your item was forwarded to a different address at 5:13 pm on January 4, 2022
in REDDING, CA. This was because of forwarding instructions or because the
address or ZIP Code on the label was incorrect.
""",
}

tracks = {}
# parsing and tuple list by key ( example : delivered, out_for_delivery and so on )
for key in TRACK_SUMMARIES:
value = TRACK_SUMMARIES[key].replace("\n", "")
found_date = get_date(value)
start_time, end_time = get_time(value)
tracks[key] = [ found_date, start_time, end_time, value ]
# print(key, '->', value)
# if (found_date != None):
# print('found date: ' + found_date.strftime("%m/%d/%Y"))
# if (start_time != None):
# if(end_time == None):
# print('time: ' + start_time.strftime("%H:%M:%S"))
# else:
# print('start time: ' + start_time.strftime("%H:%M:%S") + ' end time: ' + end_time.strftime("%H:%M:%S"))
# print('=========================================================================')

# decoding from tuple list by key ( tracks['delivered'], tracks['out_for_delivery'] and so on )
for key in tracks.keys():
found_date, start_time, end_time, value = tracks[key]

found_date = found_date.strftime("%m/%d/%Y") if found_date != None else None
start_time = start_time.strftime("%H:%M:%S") if start_time != None else None
end_time = end_time.strftime("%H:%M:%S") if end_time != None else None

print(value)
print(key)
if (found_date != None):
print('found date: ' + found_date)
if (start_time != None):
if(end_time == None):
print('time: ' + start_time)
else:
print('start time: ' + start_time + ' end time: ' + end_time)
print('------------------------------------------------------------------------')

结果

$ python reg-express.py
Your item was delivered in or at the mailbox at 10:23 am on December 24, 2022 in HOBE SOUND, FL 33455.
delivered
found date: 12/24/2022
time: 10:23:00
------------------------------------------------------------------------
Out for Delivery, December 13, 2021, 6:10 am, ARLINGTON, VA 22204.
out_for_delivery
found date: 12/13/2021
time: 06:10:00
------------------------------------------------------------------------
Out for Delivery, Expected Delivery Between 9:45am and 1:45pm
out_for_delivery_no_date
start time: 09:45:00 end time: 13:45:00
------------------------------------------------------------------------
Arrived at Post Office, Arrived at USPS Regional Origin Facility, December 11, 2021, 9:23 pm, HARRISBURG PA PACKAGE SORTING CENTER
arrived_at_post_office
found date: 12/11/2021
time: 21:23:00
------------------------------------------------------------------------
Acceptance, December 10, 2021, 12:54 pm, DALLASTOWN, PA 17313
acceptance
found date: 12/10/2021
time: 12:54:00
------------------------------------------------------------------------
Pre-Shipment Info Sent to USPS, USPS Awaiting Item, December 27, 2021
pre_shipment
found date: 12/27/2021
------------------------------------------------------------------------
Your item was returned to the sender on January 31, 2022 at 9:14 am in YORK, PA 17402 because of an incorrect address.
rts
found date: 01/31/2022
time: 09:14:00
------------------------------------------------------------------------
The Postal Service could not locate the tracking information for your request
no_info
------------------------------------------------------------------------
A shipping label has been prepared for your item at 10:47 am on December 16, 2021 in WINSTON
label_prepared
found date: 12/16/2021
time: 10:47:00
------------------------------------------------------------------------
Your item was forwarded to a different address at 5:13 pm on January 4, 2022 in REDDING, CA. This was because of forwarding instructions or because the address or ZIP Code on the label was incorrect.
forwarded
found date: 01/04/2022
time: 17:13:00
------------------------------------------------------------------------

日期/时间模式

我从您的 TRACK_SUMMARIES 字典中提取数据。这是时间和日期模式,有些行没有日期,有些行之间有时间。

10:23 am on December 24, 2022
December 13, 2021, 6:10 am
Between 9:45am and 1:45pm
December 10, 2021, 12:54 pm
December 27, 2021
January 31, 2022 at 9:14 am
at 10:47 am on December 16, 2021
at 5:13 pm on January 4, 2022

日期解析

(January|February|March|April|May|June|July|August|September|October|November|December)\s(\d{2}|\d{1})\,\s(\d{4})

enter image description here

enter image description here与组匹配的项目 - 它在代码中使用。

enter image description here

时间解析

(\d{2}|\d{1})\:(\d{2})\s*(am|pm)

enter image description here

enter image description here

将项目与组匹配 - 它在代码中使用。

enter image description here

引用资料

Find string between two substrings

Converting Strings Using datetime

Regexper

regular expression 101

关于python - USPS Package Track API 不返回 TrackSummary 的 XML 子元素,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/74902976/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com