- html - 出于某种原因,IE8 对我的 Sass 文件中继承的 html5 CSS 不友好?
- JMeter 在响应断言中使用 span 标签的问题
- html - 在 :hover and :active? 上具有不同效果的 CSS 动画
- html - 相对于居中的 html 内容固定的 CSS 重复背景?
我有一张 table ,用于存储有关我祖先的信息。例如,我创建了一张受教父启发的类似表格。
|--------+---+-------------+-----------+------+------+--------+--------+----------------+----------------|
| ID | S | First name | Last name | DoB | DoD | FID | MID | Place of birth | Job |
|--------+---+-------------+-----------+------+------+--------+--------+----------------+----------------|
| AnAn | M | Antonio | Andolini | | 1901 | | | Corleone | |
| SiAn | F | Signora | Andolini | | 1901 | | | Corleone | housewife |
| PaAn87 | M | Paolo | Andolini | 1887 | 1901 | AnAn | SiAn | | |
| ViCo92 | M | Vito | Corleone | 1892 | 1954 | AnAn | SiAn | Corleone | godfather |
| CaCo97 | F | Carmella | Corleone | 1897 | 1959 | | | | |
| ToHa10 | M | Tom | Hagen | 1910 | 1970 | ViCo92 | CaCo97 | New York | Consigliere |
| SaCo16 | M | Santino | Corleone | 1916 | 1948 | ViCo92 | CaCo97 | New York | gangster |
| SaCo17 | F | Sandra | Colombo | 1917 | | | | Messina | |
| FrCo19 | M | Frederico | Corleone | 1919 | 1959 | ViCo92 | CaCo97 | New York | Casino Manager |
| MiCo20 | M | Michael | Corleone | 1920 | 1997 | ViCo92 | CaCo97 | New York | godfather |
| ThHa20 | F | Theresa | Hagen | 1920 | | | | New Jersey | Art expert |
| LuMa23 | F | Lucy | Mancini | 1923 | | | | | Hotel employee |
| KaAd24 | F | Kay | Adams | 1934 | | | | | |
| FrCo37 | F | Francessa | Corleone | 1937 | | SaCo16 | SaCo17 | | |
| KaCo37 | F | Kathryn | Corleone | 1937 | | SaCo16 | SaCo17 | | |
| FrCo40 | F | Frank | Corleone | 1940 | | SaCo16 | SaCo17 | | |
| SaCo45 | M | Santino Jr. | Corleone | 1945 | | SaCo16 | SaCo17 | | |
| FrHa | M | Frank | Hagen | 1940 | | ToHa10 | Th20 | | |
| AnHa42 | M | Andrew | Hagen | 1942 | | ToHa10 | Th20 | | Priest |
| ViMa | M | Vincent | Mancini | 1948 | | SaCo16 | LuMa23 | New York | Godfather |
| GiHa58 | F | Gianna | Hagen | 1948 | | ToHa10 | Th20 | | |
| AnCo51 | M | Anthony | Corleone | 1951 | | MiCo20 | KaAd24 | New York | Singer |
| MaCo53 | F | Mary | Corleone | 1953 | 1979 | MiCo20 | KaAd24 | New York | Student |
| ChHa54 | F | Christina | Hagen | 1954 | | ToHa10 | Th20 | | |
| CoCo27 | F | Constanzia | Corleone | 1927 | | ViCo92 | CaCo97 | New York | rentier |
| CaRi20 | M | Carlo | Rizzi | 1920 | 1955 | | | Nevada | Bookmaker |
| ViRi49 | M | Victor | Rizzi | 1949 | | CaRi20 | CoCo27 | New York | |
| MiRi | M | Michael | Rizzi | 1955 | | CaRi20 | CoCo27 | | |
|--------+---+-------------+-----------+------+------+--------+--------+----------------+----------------|
这里,个体之间的关系可以理解为有向无环图(DAG)。我的目标是使用图形绘制将此表可视化为家谱。
ID
是起始顶点和
ParentID
结束顶点:
import pandas as pd
rawdf = pd.read_csv('corleone.csv')
el1 = rawdf[['ID','MID']]
el2 = rawdf[['ID','FID']]
el1.columns = ['Child', 'ParentID']
el2.columns = el1.columns
el = pd.concat([el1, el2])
el = el.dropna()
df = el.merge(rawdf, left_index=True, right_index=True, how='left')
df['name'] = df[df.columns[4:6]].apply(lambda x: ' '.join(x.dropna().astype(str)),axis=1)
df = df.drop(['Child','FID', 'MID', 'First name', 'Last name'], axis=1)
df = df[['ID', 'name', 'S', 'DoB', 'DoD', 'Place of birth', 'Job', 'ParentID']]
这给出了以下数据帧:
|--------+----------------------+---+--------+--------+----------------+----------------+----------|
| ID | name | S | DoB | DoD | Place of birth | Job | ParentID |
|--------+----------------------+---+--------+--------+----------------+----------------+----------|
| PaAn87 | Paolo Andolini | M | 1887.0 | 1901.0 | NaN | NaN | SiAn |
| PaAn87 | Paolo Andolini | M | 1887.0 | 1901.0 | NaN | NaN | AnAn |
| ViCo92 | Vito Corleone | M | 1892.0 | 1954.0 | Corleone | godfather | SiAn |
| ViCo92 | Vito Corleone | M | 1892.0 | 1954.0 | Corleone | godfather | AnAn |
| ToHa10 | Tom Hagen | M | 1910.0 | 1970.0 | New York | Consigliere | CaCo97 |
| ToHa10 | Tom Hagen | M | 1910.0 | 1970.0 | New York | Consigliere | ViCo92 |
| SaCo16 | Santino Corleone | M | 1916.0 | 1948.0 | New York | gangster | CaCo97 |
| SaCo16 | Santino Corleone | M | 1916.0 | 1948.0 | New York | gangster | ViCo92 |
| FrCo19 | Frederico Corleone | M | 1919.0 | 1959.0 | New York | Casino Manager | CaCo97 |
| FrCo19 | Frederico Corleone | M | 1919.0 | 1959.0 | New York | Casino Manager | ViCo92 |
| MiCo20 | Michael Corleone | M | 1920.0 | 1997.0 | New York | godfather | CaCo97 |
| MiCo20 | Michael Corleone | M | 1920.0 | 1997.0 | New York | godfather | ViCo92 |
| FrCo37 | Francessa Corleone | F | 1937.0 | NaN | NaN | NaN | SaCo17 |
| FrCo37 | Francessa Corleone | F | 1937.0 | NaN | NaN | NaN | SaCo16 |
| KaCo37 | Kathryn Corleone | F | 1937.0 | NaN | NaN | NaN | SaCo17 |
| KaCo37 | Kathryn Corleone | F | 1937.0 | NaN | NaN | NaN | SaCo16 |
| FrCo40 | Frank Corleone | F | 1940.0 | NaN | NaN | NaN | SaCo17 |
| FrCo40 | Frank Corleone | F | 1940.0 | NaN | NaN | NaN | SaCo16 |
| SaCo45 | Santino Jr. Corleone | M | 1945.0 | NaN | NaN | NaN | SaCo17 |
| SaCo45 | Santino Jr. Corleone | M | 1945.0 | NaN | NaN | NaN | SaCo16 |
| FrHa | Frank Hagen | M | 1940.0 | NaN | NaN | NaN | Th20 |
| FrHa | Frank Hagen | M | 1940.0 | NaN | NaN | NaN | ToHa10 |
| AnHa42 | Andrew Hagen | M | 1942.0 | NaN | NaN | Priest | Th20 |
| AnHa42 | Andrew Hagen | M | 1942.0 | NaN | NaN | Priest | ToHa10 |
| ViMa | Vincent Mancini | M | 1948.0 | NaN | New York | Godfather | LuMa23 |
| ViMa | Vincent Mancini | M | 1948.0 | NaN | New York | Godfather | SaCo16 |
| GiHa58 | Gianna Hagen | F | 1948.0 | NaN | NaN | NaN | Th20 |
| GiHa58 | Gianna Hagen | F | 1948.0 | NaN | NaN | NaN | ToHa10 |
| AnCo51 | Anthony Corleone | M | 1951.0 | NaN | New York | Singer | KaAd24 |
| AnCo51 | Anthony Corleone | M | 1951.0 | NaN | New York | Singer | MiCo20 |
| MaCo53 | Mary Corleone | F | 1953.0 | 1979.0 | New York | Student | KaAd24 |
| MaCo53 | Mary Corleone | F | 1953.0 | 1979.0 | New York | Student | MiCo20 |
| ChHa54 | Christina Hagen | F | 1954.0 | NaN | NaN | NaN | Th20 |
| ChHa54 | Christina Hagen | F | 1954.0 | NaN | NaN | NaN | ToHa10 |
| CoCo27 | Constanzia Corleone | F | 1927.0 | NaN | New York | rentier | CaCo97 |
| CoCo27 | Constanzia Corleone | F | 1927.0 | NaN | New York | rentier | ViCo92 |
| ViRi49 | Victor Rizzi | M | 1949.0 | NaN | New York | NaN | CoCo27 |
| ViRi49 | Victor Rizzi | M | 1949.0 | NaN | New York | NaN | CaRi20 |
| MiRi | Michael Rizzi | M | 1955.0 | NaN | NaN | NaN | CoCo27 |
| MiRi | Michael Rizzi | M | 1955.0 | NaN | NaN | NaN | CaRi20 |
|--------+----------------------+---+--------+--------+----------------+----------------+----------|
然后,我使用 graphviz 生成一个 DAG:
from graphviz import Digraph
f = Digraph('neato', format='pdf', encoding='utf8', filename='corleone', node_attr={'color': 'lightblue2', 'style': 'filled'})
f.attr('node', shape='box')
for index, row in df.iterrows():
f.edge(str(row["ParentID"]), str(row["ID"]), label='')
f.view()
看起来像这样:
最佳答案
我改进了绘图,但它仍然没有达到我的期望。所以这里是带有一些修改注释的代码。
NaN
:keep_default_na=False
ParentID
中的每个空格通过特定字符串:el.replace('', np.nan, regex=True, inplace = True)
t = pd.DataFrame({'tmp':['no_entry'+str(i) for i in range(el.shape[0])]})
el['ParentID'].fillna(t['tmp'], inplace=True)
import pandas as pd
import numpy as np
rawdf = pd.read_csv('corleone.csv', keep_default_na=False)
el1 = rawdf[['ID','MID']]
el2 = rawdf[['ID','FID']]
el1.columns = ['Child', 'ParentID']
el2.columns = el1.columns
el = pd.concat([el1, el2])
el.replace('', np.nan, regex=True, inplace = True)
t = pd.DataFrame({'tmp':['no_entry'+str(i) for i in range(el.shape[0])]})
el['ParentID'].fillna(t['tmp'], inplace=True)
df = el.merge(rawdf, left_index=True, right_index=True, how='left')
df['name'] = df[df.columns[4:6]].apply(lambda x: ' '.join(x.dropna().astype(str)),axis=1)
df = df.drop(['Child','FID', 'MID', 'First name', 'Last name'], axis=1)
df = df[['ID', 'name', 'S', 'DoB', 'DoD', 'Place of birth', 'Job', 'ParentID']]
graph_attr={"concentrate": "true", "splines":"ortho"})
name
, job
, DoB
, Place of birth
, DoD
label=
... _attributes={'color':'lightpink' if row['S']=='F' else 'lightblue'if row['S']=='M' else 'lightgray'}
from graphviz import Digraph
f = Digraph('neato', format='jpg', encoding='utf8', filename='corleone', node_attr={'style': 'filled'}, graph_attr={"concentrate": "true", "splines":"ortho"})
f.attr('node', shape='box')
for index, row in df.iterrows():
f.node(row['ID'],
label=
row['name']
+ '\n' +
row['Job']
+ '\n'+
row['DoB']
+ '\n' +
row['Place of birth']
+ '\n†' +
row['DoD'],
_attributes={'color':'lightpink' if row['S']=='F' else 'lightblue'if row['S']=='M' else 'lightgray'})
for index, row in df.iterrows():
f.edge(str(row["ParentID"]), str(row["ID"]), label='')
f.view()
结果如下:
关于python-3.x - 如何从 Pandas DataFrame 绘制家谱?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66823677/
pandas.crosstab 和 Pandas 数据透视表似乎都提供了完全相同的功能。有什么不同吗? 最佳答案 pivot_table没有 normalize争论,不幸的是。 在 crosstab
我能找到的最接近的答案似乎太复杂:How I can create an interval column in pandas? 如果我有一个如下所示的 pandas 数据框: +-------+ |
这是我用来将某一行的一列值移动到同一行的另一列的当前代码: #Move 2014/15 column ValB to column ValA df.loc[(df.Survey_year == 201
我有一个以下格式的 Pandas 数据框: df = pd.DataFrame({'a' : [0,1,2,3,4,5,6], 'b' : [-0.5, 0.0, 1.0, 1.2, 1.4,
所以我有这两个数据框,我想得到一个新的数据框,它由两个数据框的行的克罗内克积组成。正确的做法是什么? 举个例子:数据框1 c1 c2 0 10 100 1 11 110 2 12
TL;DR:在 pandas 中,如何绘制条形图以使其 x 轴刻度标签看起来像折线图? 我制作了一个间隔均匀的时间序列(每天一个项目),并且可以像这样很好地绘制它: intensity[350:450
我有以下两个时间列,“Time1”和“Time2”。我必须计算 Pandas 中的“差异”列,即 (Time2-Time1): Time1 Time2
从这个 df 去的正确方法是什么: >>> df=pd.DataFrame({'a':['jeff','bob','jill'], 'b':['bob','jeff','mike']}) >>> df
我想按周从 Pandas 框架中的列中累积计算唯一值。例如,假设我有这样的数据: df = pd.DataFrame({'user_id':[1,1,1,2,2,2],'week':[1,1,2,1,
数据透视表的表示形式看起来不像我在寻找的东西,更具体地说,结果行的顺序。 我不知道如何以正确的方式进行更改。 df示例: test_df = pd.DataFrame({'name':['name_1
我有一个数据框,如下所示。 Category Actual Predicted 1 1 1 1 0
我有一个 df,如下所示。 df: ID open_date limit 1 2020-06-03 100 1 2020-06-23 500
我有一个 df ,其中包含与唯一值关联的各种字符串。对于这些唯一值,我想删除不等于单独列表的行,最后一行除外。 下面使用 Label 中的各种字符串值与 Item 相关联.所以对于每个唯一的 Item
考虑以下具有相同名称的列的数据框(显然,这确实发生了,目前我有一个像这样的数据集!:() >>> df = pd.DataFrame({"a":range(10,15),"b":range(5,10)
我在 Pandas 中有一个 DF,它看起来像: Letters Numbers A 1 A 3 A 2 A 1 B 1 B 2
如何减去两列之间的时间并将其转换为分钟 Date Time Ordered Time Delivered 0 1/11/19 9:25:00 am 10:58:00 am
我试图理解 pandas 中的下/上百分位数计算,但有点困惑。这是它的示例代码和输出。 test = pd.Series([7, 15, 36, 39, 40, 41]) test.describe(
我有一个多索引数据框,如下所示: TQ bought HT Detailed Instru
我需要从包含值“低”,“中”或“高”的数据框列创建直方图。当我尝试执行通常的df.column.hist()时,出现以下错误。 ex3.Severity.value_counts() Out[85]:
我试图根据另一列的长度对一列进行子串,但结果集是 NaN .我究竟做错了什么? import pandas as pd df = pd.DataFrame([['abcdefghi','xyz'],
我是一名优秀的程序员,十分优秀!