gpt4 book ai didi

python - 使用正则表达式清理数据框列中的电话号码以适应标准格式

转载 作者:行者123 更新时间:2023-12-01 23:17:31 25 4
gpt4 key购买 nike

我需要使用 RegEx 将我的 DataFrame 列中填充有不同格式的手机号码的值转换为遵循一种单一格式。

表格中有5种格式,我希望它们都遵循第一种格式:

  1. +63xxxxxxxxxx #正确格式
  2. 63xxxxxxxxxx #add '+'
  3. 09xxxxxxxxx #删除'0'并添加'+63'
  4. 9xxxxxxxxx #add '+63'
  5. 09xx xxxx xxx #删除空格

我该怎么做?我尝试使用 ifs 并遍历整个值列,但我不断收到 KeyError。我确信有更好的方法可以做到这一点,所以请帮助我。

filename = "./section2/raw-website.csv"
website_df = pd.read_csv(filename)

clean_mobile_list = []

for i in website_df['mobile']:
if i[0:2] == "+63":
clean_mobile_list.append(website_df['mobile'][i])
if i[0] == "9":
clean_mobile = re.sub("", "+63", website_df['mobile'][i], 1)
clean_mobile_list.append(clean_mobile)
if i[0:1] == "09":
clean_mobile = re.sub("0", "+63", website_df['mobile'][i], 1)
clean_mobile_list.append(clean_mobile)
if i[0] == "6":
clean_mobile = re.sub("", "+", website_df['mobile'][i], 1)
clean_mobile_list.append(clean_mobile)
if i[4] == " ":
clean_mobile = re.sub(" ", "", website_df['mobile'][i])
clean_mobile_list.append(clean_mobile)

clean_mobile_list
>>>
KeyError Traceback (most recent call last)
<ipython-input-42-c3202695c4eb> in <module>
8 clean_mobile_list.append(website_df['mobile'][i])
9 if i[0] == "9":
---> 10 clean_mobile = re.sub("", "+63", website_df['mobile'][i], 1)
11 clean_mobile_list.append(clean_mobile)
12 if i[0:1] == "09":

~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/series.py in __getitem__(self, key)
851
852 elif key_is_scalar:
--> 853 return self._get_value(key)
854
855 if is_hashable(key):

~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/series.py in _get_value(self, label, takeable)
959
960 # Similar to Index.get_value, but we do not fall back to positional
--> 961 loc = self.index.get_loc(label)
962 return self.index._get_values_for_loc(self, loc, label)
963

~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/range.py in get_loc(self, key, method, tolerance)
352 except ValueError as err:
353 raise KeyError(key) from err
--> 354 raise KeyError(key)
355 return super().get_loc(key, method=method, tolerance=tolerance)
356

KeyError: '9087091471'

来自文件名的示例数据:

    email            fname     lname         mobile
0 3f@hotmail.com DNLG JSBEXJFJCEH +639273710560
1 ec3d@yahoo.com VJEZSAT TQGTVEYAL +639287703748
2 d7a8@protonmai...QCLCMOTQ EJRNWDKVUQVX 09176971246
3 adb74@yahoo.com TIPOSNZB KXTL 9161832409

最佳答案

这是一个完成这项工作的简单管道:

df['fixed_mobile'] = (df['mobile']
.str.replace('\s+', '', regex=True) # remove unwanted characters
.str.extract('^(?P<prefix>\+63)?0?(?P<number>\d+)') # extract prefix/number
.fillna({'prefix': '+63'}) # replace prefix
.apply(''.join, axis=1) # join to form number
)

输出:

             email     fname         lname         mobile   fixed_mobile
0 3f@hotmail.com DNLG JSBEXJFJCEH +639273710560 +639273710560
1 ec3d@yahoo.com VJEZSAT TQGTVEYAL +639287703748 +639287703748
2 d7a8@protonmai QCLCMOTQ EJRNWDKVUQVX 09176971246 +639176971246
3 adb74@yahoo.com TIPOSNZB KXTL 9161832409 +639161832409
4 adb74@yahoo.com TIPOSNZB KXTL 9161 832 409 +639161832409

关于python - 使用正则表达式清理数据框列中的电话号码以适应标准格式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/68646140/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com