gpt4 book ai didi

f# - Deedle 相当于 pandas.merge

转载 作者:行者123 更新时间:2023-12-04 17:52:09 25 4
gpt4 key购买 nike

我希望以与 pandas.DataFrame.Merge 类似的方式基于每个帧中的特定列合并两个 Deedle (F#) 帧最完美的例子是包含数据列和(城市,州)列的主框架以及包含以下列的信息框架:(城市,州);纬度;长。如果我想将经纬度列添加到我的主框架中,我会合并(城市,州)列中的两个框架。

这是一个例子:

    let primaryFrame =
[(0, "Job Name", box "Job 1")
(0, "City, State", box "Reno, NV")
(1, "Job Name", box "Job 2")
(1, "City, State", box "Portland, OR")
(2, "Job Name", box "Job 3")
(2, "City, State", box "Portland, OR")
(3, "Job Name", box "Job 4")
(3, "City, State", box "Sacramento, CA")] |> Frame.ofValues

let infoFrame =
[(0, "City, State", box "Reno, NV")
(0, "Lat", box "Reno_NV_Lat")
(0, "Long", box "Reno_NV_Long")
(1, "City, State", box "Portland, OR")
(1, "Lat", box "Portland_OR_Lat")
(1, "Long", box "Portland_OR_Long")] |> Frame.ofValues

// see code for merge_on below.
let mergedFrame = primaryFrame
|> merge_On infoFrame "City, State" null

这会导致“mergedFrame”看起来像这样:

> mergedFrame.Format();;
val it : string =
" Job Name City, State Lat Long
0 -> Job 1 Reno, NV Reno_NV_Lat Reno_NV_Long
1 -> Job 2 Portland, OR Portland_OR_Lat Portland_OR_Long
2 -> Job 3 Portland, OR Portland_OR_Lat Portland_OR_Long
3 -> Job 4 Sacramento, CA <missing> <missing>

我想出了一种方法来执行此操作(上面示例中使用的“merge_on”函数),但作为一名刚接触 F# 的销售工程师,我想有一种更惯用/更有效的方法来执行此操作.下面是我执行此操作的函数以及“removeDuplicateRows”,它可以完成您所期望的并且是“merge_on”函数所需要的;如果您也想对更好的方法发表评论,请发表评论。

    let removeDuplicateRows column (frame : Frame<'a, 'b>) =
let nonDupKeys = frame.GroupRowsBy(column).RowKeys
|> Seq.distinctBy (fun (a, b) -> a)
|> Seq.map (fun (a, b) -> b)
frame.Rows.[nonDupKeys]


let merge_On (infoFrame : Frame<'c, 'b>) mergeOnCol missingReplacement
(primaryFrame : Frame<'a,'b>) =
let frame = primaryFrame.Clone()
let infoFrame = infoFrame
|> removeDuplicateRows mergeOnCol
|> Frame.indexRows mergeOnCol
let initialSeries = frame.GetColumn(mergeOnCol)
let infoFrameRows = infoFrame.RowKeys
for colKey in infoFrame.ColumnKeys do
let newSeries =
[for v in initialSeries.ValuesAll do
if Seq.contains v infoFrameRows then
let key = infoFrame.GetRow(v)
yield key.[colKey]
else
yield box missingReplacement ]
frame.AddColumn(colKey, newSeries)
frame

感谢您的帮助!

更新:

将 Frame.indexRowsString 切换为 Frame.indexRows 以处理“mergOnCol”中的类型不是字符串的情况。

按照 Tomas 的建议摆脱了 infoFrame.Clone()

最佳答案

遗憾的是,Deedle 连接帧的方式(仅在行/列键中)意味着它没有一个很好的内置函数来连接非键列上的帧。

据我所知,您的方法对我来说非常好。您不需要 infoFrame 上的 Clone(因为您没有改变框架),我认为您可以将 infoFrame.GetRow 替换为 infoFrame.TryGetRow(这样你就不需要提前获取 key 了),但除此之外,你的代码看起来没问题!

我想出了一个更短的替代方法,如下所示:

// Index the info frame by city/state, so that we can do lookup
let infoByCity = infoFrame |> Frame.indexRowsString "City, State"

// Create a new frame with the same row indices as 'primaryFrame'
// containing the additional information from infoFrame.
let infoMatched =
primaryFrame.Rows
|> Series.map (fun k row ->
// For every row, we get the "City, State" value of the row and then
// find the corresponding row with additional information in infoFrame. Using
// 'ValueOrDefault' will automatically give missing when the key does not exist
infoByCity.Rows.TryGet(row.GetAs<string>("City, State")).ValueOrDefault)
// Now turn the series of rows into a frame
|> Frame.ofRows

// Now we have two frames with matching keys, so we can join!
primaryFrame.Join(infoMatched)

这有点短,也许更不言自明,但我没有做任何测试来检查哪个更快。除非性能是首要考虑因素,否则我认为使用更具可读性的版本是一个不错的默认选择!

关于f# - Deedle 相当于 pandas.merge,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43810417/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com