gpt4 book ai didi

python - 如何将 csv 数据文件导入 scikit-learn?

转载 作者:IT老高 更新时间:2023-10-28 22:05:50 26 4
gpt4 key购买 nike

据我了解,scikit-learn 接受 (n-sample, n-feature) 格式的数据,这是一个 2D 数组。假设我有表单中的数据...

Stock prices    indicator1    indicator2
2.0 123 1252
1.0 .. ..
.. . .
.

如何导入?

最佳答案

numpy loadtxt 的一个很好的替代品是 read_csv from Pandas .数据被加载到 Pandas 数据帧中的一大优势是它可以处理混合数据类型,例如某些列包含文本,而其他列包含数字。然后,您可以轻松地仅选择数字列并使用 as_matrix 转换为 numpy 数组。 . Pandas 也会read/write excel files and a bunch of other formats .

如果我们有一个名为“mydata.csv”的 csv 文件:

point_latitude,point_longitude,line,construction,point_granularity
30.102261, -81.711777, Residential, Masonry, 1
30.063936, -81.707664, Residential, Masonry, 3
30.089579, -81.700455, Residential, Wood , 1
30.063236, -81.707703, Residential, Wood , 3
30.060614, -81.702675, Residential, Wood , 1

这将读入 csv 并将数字列转换为 scikit_learn 的 numpy 数组,然后修改列的顺序并将其写入 excel 电子表格:

import numpy as np
import pandas as pd

input_file = "mydata.csv"


# comma delimited is the default
df = pd.read_csv(input_file, header = 0)

# for space delimited use:
# df = pd.read_csv(input_file, header = 0, delimiter = " ")

# for tab delimited use:
# df = pd.read_csv(input_file, header = 0, delimiter = "\t")

# put the original column names in a python list
original_headers = list(df.columns.values)

# remove the non-numeric columns
df = df._get_numeric_data()

# put the numeric column names in a python list
numeric_headers = list(df.columns.values)

# create a numpy array with the numeric values for input into scikit-learn
numpy_array = df.as_matrix()

# reverse the order of the columns
numeric_headers.reverse()
reverse_df = df[numeric_headers]

# write the reverse_df to an excel spreadsheet
reverse_df.to_excel('path_to_file.xls')

关于python - 如何将 csv 数据文件导入 scikit-learn?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/11023411/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com