gpt4 book ai didi

After extracting numbers from dataframe, how can I calculate the average price after converting to one unit of measure uniformly(从数据框中提取数字后,如何统一计算换算成一个计量单位后的平均价格)

转载 作者:bug小助手 更新时间:2023-10-25 14:53:58 32 4
gpt4 key购买 nike



I have a dataset which contains different properties rent price. It looks like this:

我有一个包含不同房产租金价格的数据集。它看起来是这样的:


data = {
'prices': [
'$350.00',
'$450.00 pw',
'$325 per week',
'$495pw - Views! White goods!',
'$460p/w + gst and outgoings',
'$300 wk',
'$390pw / $1695pcm',
'$180 pw / $782 pm',
'$375 Per Week/Fully Furnished',
'$350 pw + GST & Outgoings',
'APPLY NOW - From $270 per week',
'$185 per night',
'$400pw incl. power',
'$500 weekly',
'$600 per week pw',
'$850 per week (Fully furnished)',
'FROM $400PW, FURNITURE AND BILLS INCLUDED',
'THE DEAL- $780 PER WEEK',
'THE DEAL: $1,400 PER WEEK',
'$750/W Unfurnished',
'$320 - fully furnished pw',
'$330 PER WEEK | $1,430 P.C.M',
'Enquire Now: $690 per week',
'$460 per week / $1999 per month',
'$490 per week/Under Application approved',
'$1550pw - Location! Rare gem!',
'295 per week', # Example without a dollar sign
'unit 2 - $780pw unit 3 - $760pw', # Example with multiple prices
'$2500 pw high, $1600pw low,$380 pn', # Example with multiple prices
'from $786 - $1572 per week', # Example with multiple prices
'$590 to $639', # Example with a range
'$280 - $290 pw' # Example with a range
]
}

My aim is to clean this "prices" col so that only the weekly rent price is presented.

我的目标是清理这个“价格”列,以便只显示每周的租金价格。


I failed to manage the last five kinds of data, this is what I have done:

我未能管理最后五种数据,以下是我所做的:


df = pd.DataFrame(data)

def extract_weekly_price(text):
price_match = re.search(r'\$?([\d,]+)', text)
if price_match:
price_str = price_match.group(1)
price = int(price_str.replace(',', ''))

# convert to weekly if not
if re.search(r'(per week|p\.w\.|p/w|pw|/w|weekly)', text):
return price
elif 'p.a' in text:
return price / 52
elif re.search(r'(p\.c\.m|pcm|mth|pm)', text):
return price / 4.33
elif 'per night' in text:
return price * 7
else:
return price
else:
return None

df['prices'] = df['prices'].str.lower()
df['Weekly_rent'] = df['prices'].apply(extract_weekly_price).round(3)

How can I modify my code so that I can get the average weekly price for those data with a range such as '$590 to $639' or '$280 - $290 pw'? I would be really appreciate if you could help.

我如何修改我的代码,以便我可以获得这些数据的平均每周价格,范围如‘590美元到639美元’或‘$280-$290 PW’?如果你能帮忙,我将不胜感激。


更多回答

How should be calculated the average price for this one '$2500 pw high, $1600pw low,$380 pn' ? (where there is two weekly prices and one nightly/daily price).

应该如何计算这一个‘$2500 pw高,$1600pw低,$380 pn’的平均价格?(其中有两个周价和一个夜间/日价)。

优秀答案推荐

Here is an example how you can use re + match keyword to get the prices from the text (link to regex101):

下面是一个如何使用re+Match关键字从文本(指向regex101的链接)获取价格的示例:


import re

data = {
"prices": [
"$350.00",
"$450.00 pw",
"$325 per week",
"$495pw - Views! White goods!",
"$460p/w + gst and outgoings",
"$300 wk",
"$390pw / $1695pcm",
"$180 pw / $782 pm",
"$375 Per Week/Fully Furnished",
"$350 pw + GST & Outgoings",
"APPLY NOW - From $270 per week",
"$185 per night",
"$400pw incl. power",
"$500 weekly",
"$600 per week pw",
"$850 per week (Fully furnished)",
"FROM $400PW, FURNITURE AND BILLS INCLUDED",
"THE DEAL- $780 PER WEEK",
"THE DEAL: $1,400 PER WEEK",
"$750/W Unfurnished",
"$320 - fully furnished pw",
"$330 PER WEEK | $1,430 P.C.M",
"Enquire Now: $690 per week",
"$460 per week / $1999 per month",
"$490 per week/Under Application approved",
"$1550pw - Location! Rare gem!",
"295 per week", # Example without a dollar sign
"unit 2 - $780pw unit 3 - $760pw", # Example with multiple prices
"$2500 pw high, $1600pw low,$380 pn", # Example with multiple prices
"from $786 - $1572 per week", # Example with multiple prices
"$590 to $639", # Example with a range
"$280 - $290 pw", # Example with a range
]
}

pat = re.compile(
r"(?i)(?:\$([\d+,.]+)\s*(?:-|to)\s*)?\$?([\d+,.]+)(?=\s*(?:(pw|per week|wk|p?/w|weekly|- fully furnished pw$|$)|(per night|pn)|(pm|per month|p\.c\.m|pcm)))"
)

for i, p in enumerate(data["prices"]):
for price1, price2, week, night, month in pat.findall(p):
match [price1, price2, week, night, month]:
# single price/week
case ["", _, _, "", ""]:
price = float(price2.replace(",", ""))
print(f"{i:<3} CASE 1: {p:<50} {price:<10} / week")

# single price/night
case ["", _, "", _, ""]:
price = float(price2.replace(",", "").replace(".", ""))
print(f"{i:<3} CASE 2: {p:<50} {price:<10} / night")

# single price/month
case ["", _, "", "", _]:
price = float(price2.replace(",", "").replace(".", ""))
print(f"{i:<3} CASE 3: {p:<50} {price:<10} / month")

# range price/week
case [_, _, _, "", ""]:
price1 = float(price1.replace(",", "").replace(".", ""))
price2 = float(price2.replace(",", "").replace(".", ""))
price = (price1 + price2) / 2
print(f"{i:<3} CASE 4: {p:<50} {price:<10} / week (mean)")

# add cases here
# ...

case _:
print(f"{i:<3} UNKNOWN: {p:<50}")

Prints:

打印:


0   CASE 1: $350.00                                            350.0      / week
1 CASE 1: $450.00 pw 450.0 / week
2 CASE 1: $325 per week 325.0 / week
3 CASE 1: $495pw - Views! White goods! 495.0 / week
4 CASE 1: $460p/w + gst and outgoings 460.0 / week
5 CASE 1: $300 wk 300.0 / week
6 CASE 1: $390pw / $1695pcm 390.0 / week
6 CASE 3: $390pw / $1695pcm 1695.0 / month
7 CASE 1: $180 pw / $782 pm 180.0 / week
7 CASE 3: $180 pw / $782 pm 782.0 / month
8 CASE 1: $375 Per Week/Fully Furnished 375.0 / week
9 CASE 1: $350 pw + GST & Outgoings 350.0 / week
10 CASE 1: APPLY NOW - From $270 per week 270.0 / week
11 CASE 2: $185 per night 185.0 / night
12 CASE 1: $400pw incl. power 400.0 / week
13 CASE 1: $500 weekly 500.0 / week
14 CASE 1: $600 per week pw 600.0 / week
15 CASE 1: $850 per week (Fully furnished) 850.0 / week
16 CASE 1: FROM $400PW, FURNITURE AND BILLS INCLUDED 400.0 / week
17 CASE 1: THE DEAL- $780 PER WEEK 780.0 / week
18 CASE 1: THE DEAL: $1,400 PER WEEK 1400.0 / week
19 CASE 1: $750/W Unfurnished 750.0 / week
20 CASE 1: $320 - fully furnished pw 320.0 / week
21 CASE 1: $330 PER WEEK | $1,430 P.C.M 330.0 / week
21 CASE 3: $330 PER WEEK | $1,430 P.C.M 1430.0 / month
22 CASE 1: Enquire Now: $690 per week 690.0 / week
23 CASE 1: $460 per week / $1999 per month 460.0 / week
23 CASE 3: $460 per week / $1999 per month 1999.0 / month
24 CASE 1: $490 per week/Under Application approved 490.0 / week
25 CASE 1: $1550pw - Location! Rare gem! 1550.0 / week
26 CASE 1: 295 per week 295.0 / week
27 CASE 1: unit 2 - $780pw unit 3 - $760pw 780.0 / week
27 CASE 1: unit 2 - $780pw unit 3 - $760pw 760.0 / week
28 CASE 1: $2500 pw high, $1600pw low,$380 pn 2500.0 / week
28 CASE 1: $2500 pw high, $1600pw low,$380 pn 1600.0 / week
28 CASE 2: $2500 pw high, $1600pw low,$380 pn 380.0 / night
29 CASE 4: from $786 - $1572 per week 1179.0 / week (mean)
30 CASE 4: $590 to $639 614.5 / week (mean)
31 CASE 4: $280 - $290 pw 285.0 / week (mean)


You can test this approach and see if the average prices are computed correctly :

您可以测试此方法,并查看是否正确计算了平均价格:


import numpy as np

pat = (
r"(?i)"
r"(\d+[.,]?\d+)\s*" # prices
r"(?:-.*\bfurnished\s*)?" # optional text
r"(p/?w|wdk|per week|weekly|/w|" # weekly
r"p\.?c\.?m|mth|pm|per month|" # monthly
r"per night|pn|" # daily
r"p.a)?" # ??
)

tmp = df["prices"].str.extractall(pat)
fn = lambda x: tmp[1].str.contains(x, case=False, na=False)
s0 = tmp[0].replace(",", "", regex=True).astype(float)

averge = np.select(
[fn("w"), fn("n"), fn("m"), fn("p.a")],
[s0, s0.mul(7), s0.div(4.33), s0.div(52)], default=s0
)

out = (
df[["prices"]].join(
tmp.assign(all_prices=averge.round(2)).groupby(level=0)
.agg(
computed_prices=("all_prices", list), # optional
average=("all_prices", "mean")
)
)
)


Regex : [demo]



Output :

输出:


print(out)


prices computed_prices average
0 $350.00 [350.0] 350.00
1 $450.00 pw [450.0] 450.00
2 $325 per week [325.0] 325.00
3 $495pw - Views! White goods! [495.0] 495.00
4 $460p/w + gst and outgoings [460.0] 460.00
5 $300 wk [300.0] 300.00
6 $390pw / $1695pcm [390.0, 391.45] 390.73
7 $180 pw / $782 pm [180.0, 180.6] 180.30
8 $375 Per Week/Fully Furnished [375.0] 375.00
9 $350 pw + GST & Outgoings [350.0] 350.00
10 APPLY NOW - From $270 per week [270.0] 270.00
11 $185 per night [1295.0] 1295.00
12 $400pw incl. power [400.0] 400.00
13 $500 weekly [500.0] 500.00
14 $600 per week pw [600.0] 600.00
15 $850 per week (Fully furnished) [850.0] 850.00
16 FROM $400PW, FURNITURE AND BILLS INCLUDED [400.0] 400.00
17 THE DEAL- $780 PER WEEK [780.0] 780.00
18 THE DEAL: $1,400 PER WEEK [1400.0] 1400.00
19 $750/W Unfurnished [750.0] 750.00
20 $320 - fully furnished pw [320.0] 320.00
21 $330 PER WEEK | $1,430 P.C.M [330.0, 1430.0] 880.00
22 Enquire Now: $690 per week [690.0] 690.00
23 $460 per week / $1999 per month [460.0, 13993.0] 7226.50
24 $490 per week/Under Application approved [490.0] 490.00
25 $1550pw - Location! Rare gem! [1550.0] 1550.00
26 295 per week [295.0] 295.00
27 unit 2 - $780pw unit 3 - $760pw [780.0, 760.0] 770.00
28 $2500 pw high, $1600pw low,$380 pn [2500.0, 1600.0, 2660.0] 2253.33
29 from $786 - $1572 per week [786.0, 1572.0] 1179.00
30 $590 to $639 [590.0, 639.0] 614.50
31 $280 - $290 pw [280.0, 290.0] 285.00

更多回答

Thank you so much! This helped a lot! I did some modify based on your code, basically it's just for the np.select part

非常感谢!这帮了大忙了!我根据你的代码做了一些修改,基本上只是针对np.select部分

32 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com