I have created a Pandas DataFrame and plotted it using .hist()
.
我已经创建了一个Pandas DataFrame,并使用.hist()绘制了它。
I want to be able to draw lines/curves on top of the same figure.
我希望能够在同一图形上绘制直线/曲线。
How can I do that?
我怎么能做到这一点?
I was able to plot my data as a histogram using df.hist(column='Example', bins=15)
. This assigns the returned object to axis
.
我能够使用df.hist将数据绘制为直方图(Column=‘Example’,bins=15)。这会将返回的对象分配给AXIS。
I thought I might be able to plot a line using ax=axis
as an argument. But this isn't valid.
我想我也许能够使用ax=axis作为参数来绘制一条线。但这是不成立的。
It appears that plt.plot
takes different kwargs
to DataFrame.hist
. Multiple sets of data from a DataFrame can be plot on the same figure, as histograms, using .hist()
in combination with an argument of ax=axis
.
Plt.plot似乎对DataFrame.hist有不同的看法。通过结合使用.hist()和参数ax=axis,可以将来自DataFrame的多组数据像直方图一样绘制在同一图形上。
Here is some example code, taken from a Jupyter Notebook, plus some data to play with.
以下是取自Jupyter笔记本的一些示例代码,以及一些可供使用的数据。
data = [211995, 139950, 202995, 223000, 184995, 82000, 127000, 240000, 116000, 74500, 151000, 149000, 290000, 146000, 174500, 418000, 150000, 150000, 260000, 100000, 282500, 510000, 142000, 382000, 220000, 259000, 330000, 177500, 290000, 280000, 118000, 97000, 124000, 385000, 199950, 90000, 135000, 395000, 182000, 105000, 80000, 230000, 227950, 176995, 110000, 142000, 132500, 100000, 95000, 257500, 186000, 230000, 169995, 167995, 119950, 119950, 361000, 125000, 242000, 240000, 205000, 187500, 180000, 146000, 257995, 380000, 144995, 139995, 159995, 265000, 288000, 288000, 162500, 290000, 182737, 235000, 250000, 175000, 153000, 125000, 170000, 165000, 187995, 250000, 220000, 108750, 125000, 245000, 100000, 130000, 115000, 218000, 190000, 435000, 300000, 465000, 179950, 259500, 187000, 200000]
import pandas as pd
import matplotlib.pyplot as plt
import numpy
df = pd.DataFrame(example_data)
df.columns = ['Example']
axis = df.hist(column='Example', bins = 15)
x = numpy.linspace(1e5, 5e5, 20)
def f(x):
return x * numpy.exp(-x)
y = f(x)
plt.plot(x, y, axis)
更多回答
- The following explains how to use the object returned by a pandas histogram plot, to plot into a specific
axes
.
- There are three pandas plotting methods for creating a histogram:
pandas.DataFrame.hist
, which returns matplotlib.AxesSubplot
or numpy.ndarray
of them
pandas.DataFrame.plot.hist
, which returns matplotlib.AxesSubplot
pandas.DataFrame.plot
with kind='hist'
, which returns matplotlib.axes.Axes
or numpy.ndarray
of them
- The pandas plotting API is very versitile, so it's recommended to always check the type of object returned by the plot.
- It is better to use the explicit
axes
interface, than to switch to the implict pyplot
interface, as per Why be explicit?
Data and Imports
import pandas as pd
import seaborn as sns # for data
import matplotlib.pyplot as plt
df = sns.load_dataset('penguins')
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
0 Adelie Torgersen 39.1 18.7 181.0 3750.0 Male
1 Adelie Torgersen 39.5 17.4 186.0 3800.0 Female
2 Adelie Torgersen 40.3 18.0 195.0 3250.0 Female
3 Adelie Torgersen NaN NaN NaN NaN NaN
4 Adelie Torgersen 36.7 19.3 193.0 3450.0 Female
df.hist
axes = df.hist(['bill_length_mm', 'bill_depth_mm'])
# draws first axes
axes[0][0].axhline(y=30, c='r')
# draws on second axes
axes[0][1].axhline(y=30, c='purple')
# only drawn an the last plot
plt.axhline(y=20, c='k')
# returns an array of axes
axes
[out]:
array([[<Axes: title={'center': 'bill_length_mm'}>, <Axes: title={'center': 'bill_depth_mm'}>]], dtype=object)
# with only a single column
axes = df.hist('bill_length_mm')
# index to the axes
axes[0][0].axhline(y=30, c='r')
# switching to the implicit interface works, but is not the recommended way
plt.axhline(y=20, c='k')
# still an array with a single axes
axes
[out]:
array([[<Axes: title={'center': 'bill_length_mm'}>]], dtype=object)
df.plot.hist
Single axes
ax = df['bill_length_mm'].plot.hist()
# or
ax = df.plot.hist()
# or
ax = df.plot.hist(y='bill_length_mm')
# plot on the axes
ax.axhline(y=30, c='k')
# returns a single axes
ax
[out]:
<Axes: ylabel='Frequency'>
Array of axes
axes = df.plot.hist(by='sex')
# draws first axes
axes[0].axhline(y=30, c='r')
# draws on second axes
axes[1].axhline(y=30, c='purple')
# only drawn an the last plot
plt.axhline(y=20, c='k')
# returns an array of axes
axes
[out]:
array([<Axes: title={'center': 'Female'}, ylabel='Frequency'>, <Axes: title={'center': 'Male'}, ylabel='Frequency'>], dtype=object)
df.plot(kind='hist')
Single axes
ax = df['bill_length_mm'].plot(kind='hist')
# or
ax = df.plot(kind='hist')
# or
ax = df.plot(kind='hist', y='bill_length_mm')
# plot on the axes
ax.axhline(y=30, c='k')
# returns a single axes
ax
[out]:
<Axes: ylabel='Frequency'>
Array of axes
axes = df.plot(kind='hist', by='sex')
# draws first axes
axes[0].axhline(y=30, c='r')
# draws on second axes
axes[1].axhline(y=30, c='purple')
# only drawn an the last plot
plt.axhline(y=20, c='k')
# returns an array of axes
axes
[out]:
array([<Axes: title={'center': 'Female'}, ylabel='Frequency'>, <Axes: title={'center': 'Male'}, ylabel='Frequency'>], dtype=object)
It's just this. Apparently the axis kwarg is not required.
事情就是这样。显然,轴心术并不是必需的。
plt.plot(x, y)
plt.show()
Full example:
完整示例:
df = pd.DataFrame(example_data)
df.columns = ['Example']
axis = df.hist(column='Example', bins = 15)
x = numpy.linspace(1e5, 5e5, 20)
def f(x):
return 1.0e-5 * x * numpy.exp(-1.0e-5 * x)
y = f(x)
plt.plot(x, y)
plt.show()
更多回答
我是一名优秀的程序员,十分优秀!