i am new to python and i want to create a dataframe from two list below.
我是新手,我想从下面的两个列表中创建一个dataframe。
my_foldername = ['folder1','folder2']
my_filetype = ['avi.txt','bmp.txt','exe.txt','avi.txt','bmp.txt','exe.txt']
here is my current line of code.
以下是我当前的代码行。
df = pd.DataFrame(list(zip(my_hostname,my_fileType)))
i would like to have this kind of output.
我想要这样的产量。
| -------- | -------- | -------- | -------- |
|FolderName| AVI | BMP | EXE |
| -------- | -------- | -------- | -------- |
| folder1 | avi.txt | bmp.txt | exe.txt |
| -------- | -------- | -------- | -------- |
| folder2 | avi.txt | bmp.txt | exe.txt |
| -------- | -------- | -------- | -------- |
更多回答
If you surround your code in ``` ``` it will render as code in the post. Please edit so your code is readable.
如果您将代码放在`中,它将在POST中呈现为代码。请进行编辑,使您的代码可读。
You can use zip
as you tried to, but you first need to break the second list in chunks. In python 3.12+ this is easily done with itertools.batched
:
您可以尝试使用Zip,但您首先需要将第二个列表分成块。在python3.12+中,这可以很容易地通过迭代器来完成。批处理:
from itertools import batched
n = len(my_filetype)//len(my_foldername)
out = pd.DataFrame([[f, *t] for f,t
in zip(my_foldername, batched(my_filetype, n))],
columns=['FolderName', 'AVI', 'BMP', 'EXE'])
Output:
产出:
FolderName AVI BMP EXE
0 folder1 avi.txt bmp.txt exe.txt
1 folder2 avi.txt bmp.txt exe.txt00
If you don't have the latest python version (<3.12), use the batched
recipe:
如果您没有最新的Python版本(低于3.12),请使用批处理食谱:
from itertools import islice
def batched(iterable, n):
# batched('ABCDEFG', 3) --> ABC DEF G
if n < 1:
raise ValueError('n must be at least one')
it = iter(iterable)
while batch := tuple(islice(it, n)):
yield batch
Alternatively, with a simple loop and enumerate
:
或者,使用简单的循环和枚举:
n = len(my_filetype)//len(my_foldername)
out = pd.DataFrame([[f, *my_filetype[i*n:(i+1)*n]] for i,f
in enumerate(my_foldername)],
columns=['FolderName', 'AVI', 'BMP', 'EXE'])
You can use numpy.reshape
to change my_filetype
into a 2 x 3 array:
您可以使用numpy.reshape将my_filetype更改为2 x 3数组:
df = pd.DataFrame(
np.reshape(my_filetype, (2, 3)),
columns=["AVI", "BMP", "EXE"],
index=pd.Series(my_foldername, name="FolderName"),
).reset_index()
If my_filetype
has many more elements and you don't want to calculate the number of rows manually, you can supply -1 to tell numpy to do it for you:
如果my_filetype包含更多元素,并且您不想手动计算行数,则可以提供-1来告诉Numpy为您计算行数:
np.reshape(my_filetype, (-1, 3))
You can first create a dictionary and then convert it into a DataFrame using the Pandas library as follows:
您可以首先创建词典,然后使用Pandas库将其转换为DataFrame,如下所示:
import pandas as pd
my_foldername = ['folder1', 'folder2']
my_filetype = ['avi.txt', 'bmp.txt', 'exe.txt', 'avi.txt', 'bmp.txt', 'exe.txt']
data = {}
for file_type in my_filetype:
file_extension = file_type.split('.')[0].upper()
data[file_extension] = data.get(file_extension, []) + [file_type]
df = pd.DataFrame(data)
df.insert(0, 'FolderName', my_foldername)
df = df.fillna('')
print(df)
This is the output for the above code:
以下是上述代码的输出:
更多回答
Using len(columns)
instead of 3
might be more generic too
使用len(columns)而不是3可能更通用
我是一名优秀的程序员,十分优秀!