卓越飞翔博客卓越飞翔博客

卓越飞翔 - 您值得收藏的技术分享站
技术文章16858本站已运行3321

如何在Python中将Scikit-learn的IRIS数据集转换为只有两个特征的数据集?

如何在Python中将Scikit-learn的IRIS数据集转换为只有两个特征的数据集?

Iris,一个多元花卉数据集,是最有用的 Pyhton scikit-learn 数据集之一。它分为 3 类,每类 50 个实例,包含三种鸢尾花(山鸢尾、维吉尼亚鸢尾和杂色鸢尾)的萼片和花瓣部分的测量值。除此之外,Iris 数据集包含这三个物种中每个物种的 50 个实例,并由四个特征组成,即 sepal_length (cm)、sepal_width (cm)、petal_length (cm)、petal_width (cm)。

我们可以使用主成分分析(PCA)将 IRIS 数据集转换为具有 2 个特征的新特征空间。

步骤

我们可以按照下面给出的步骤,使用 Python 中的 PCA 将 IRIS 数据集转换为 2 特征数据集 -

第 1 步 - 首先,从 scikit-learn 导入必要的包。我们需要导入数据集和分解包。

步骤 2 - 加载 IRIS 数据集。

步骤 3 - 打印有关数据集的详细信息。

步骤 4 - 初始化主成分分析 (PCA) 并应用 fit() 函数来拟合数据。

步骤 5 - 将数据集转换为新维度,即 2 特征数据集。

示例

在下面的示例中,我们将使用上述步骤通过 PCA 将 scikit-learn IRIS 植物数据集转换为 2 个特征。

# Importing the necessary packages</span>
from</span> sklearn import</span> datasets
from</span> sklearn import</span> decomposition

# Load iris plant dataset</span>
iris =</span> datasets.</span>load_iris(</span>)</span>

# Print details about the dataset</span>
print</span>(</span>\'Features names : \'</span>+</span>str</span>(</span>iris.</span>feature_names)</span>)</span>
print</span>(</span>\'\n\'</span>)</span>
print</span>(</span>\'Features size : \'</span>+</span>str</span>(</span>iris.</span>data.</span>shape)</span>)</span>
print</span>(</span>\'\n\'</span>)</span>
print</span>(</span>\'Target names : \'</span>+</span>str</span>(</span>iris.</span>target_names)</span>)</span>
print</span>(</span>\'\n\'</span>)</span>
X_iris,</span> Y_iris =</span> iris.</span>data,</span> iris.</span>target

# Initialize PCA and fit the data</span>
pca_2 =</span> decomposition.</span>PCA(</span>n_components=</span>2</span>)</span>
pca_2.</span>fit(</span>X_iris)</span>

# Transforming iris data to new dimensions(with 2 features)</span>
X_iris_pca2 =</span> pca_2.</span>transform(</span>X_iris)</span>

# Printing new dataset</span>
print</span>(</span>\'New Dataset size after transformations: \'</span>,</span> X_iris_pca2.</span>shape)</span>

输出

它将产生以下输出 -

Features names : [\'sepal length (cm)\', \'sepal width (cm)\', \'petal length (cm)\', \'petal width (cm)\']

Features size : (150, 4)

Target names : [\'setosa\' \'versicolor\' \'virginica\']

New Dataset size after transformations: (150, 2)

如何将 Iris 数据集转换为 3 特征数据集?

我们可以使用称为主成分分析(PCA)的统计方法将 Iris 数据集转换为具有 3 个特征的新特征空间。 PCA通过分析原始数据集的特征,基本上将数据线性投影到新的特征空间中。

PCA 背后的主要概念是选择数据的“主要”特征并基于它们构建特征。它将为我们提供新的数据集,该数据集的大小较小,但具有与原始数据集相同的信息。

示例

在下面的示例中,我们将使用 PCA 转换 scikit-learn Iris 植物数据集(用 3 个组件初始化)。

# Importing the necessary packages</span>
from</span> sklearn import</span> datasets
from</span> sklearn import</span> decomposition

# Load iris plant dataset</span>
iris =</span> datasets.</span>load_iris(</span>)</span>

# Print details about the dataset</span>
print</span>(</span>\'Features names : \'</span>+</span>str</span>(</span>iris.</span>feature_names)</span>)</span>
print</span>(</span>\'\n\'</span>)</span>
print</span>(</span>\'Features size : \'</span>+</span>str</span>(</span>iris.</span>data.</span>shape)</span>)</span>
print</span>(</span>\'\n\'</span>)</span>
print</span>(</span>\'Target names : \'</span>+</span>str</span>(</span>iris.</span>target_names)</span>)</span>
print</span>(</span>\'\n\'</span>)</span>
print</span>(</span>\'Target size : \'</span>+</span>str</span>(</span>iris.</span>target.</span>shape)</span>)</span>
X_iris,</span> Y_iris =</span> iris.</span>data,</span> iris.</span>target

# Initialize PCA and fit the data</span>
pca_3 =</span> decomposition.</span>PCA(</span>n_components=</span>3</span>)</span>
pca_3.</span>fit(</span>X_iris)</span>

# Transforming iris data to new dimensions(with 2 features)</span>
X_iris_pca3 =</span> pca_3.</span>transform(</span>X_iris)</span>

# Printing new dataset</span>
print</span>(</span>\'New Dataset size after transformations : \'</span>,</span> X_iris_pca3.</span>shape)</span>
print</span>(</span>\'\n\'</span>)</span>

# Getting the direction of maximum variance in data</span>
print</span>(</span>"Components : "</span>,</span> pca_3.</span>components_)</span>
print</span>(</span>\'\n\'</span>)</span>

# Getting the amount of variance explained by each component</span>
print</span>(</span>"Explained Variance:"</span>,</span>pca_3.</span>explained_variance_)</span>
print</span>(</span>\'\n\'</span>)</span>

# Getting the percentage of variance explained by each component</span>
print</span>(</span>"Explained Variance Ratio:"</span>,</span>pca_3.</span>explained_variance_ratio_)</span>
print</span>(</span>\'\n\'</span>)</span>

# Getting the singular values for each component</span>
print</span>(</span>"Singular Values :"</span>,</span>pca_3.</span>singular_values_)</span>
print</span>(</span>\'\n\'</span>)</span>

# Getting estimated noise covariance</span>
print</span>(</span>"Noise Variance :"</span>,</span>pca_3.</span>noise_variance_)</span>

输出

它将产生以下输出 -

Features names : [\'sepal length (cm)\', \'sepal width (cm)\', \'petal length (cm)\', \'petal width (cm)\']

Features size : (150, 4)

Target names : [\'setosa\' \'versicolor\' \'virginica\']

Target size : (150,)
New Dataset size after transformations : (150, 3)

Components : [[ 0.36138659 -0.08452251 0.85667061 0.3582892 ]
[ 0.65658877 0.73016143 -0.17337266 -0.07548102]
[-0.58202985 0.59791083 0.07623608 0.54583143]]

Explained Variance: [4.22824171 0.24267075 0.0782095 ]

Explained Variance Ratio: [0.92461872 0.05306648 0.01710261]

Singular Values : [25.09996044 6.01314738 3.41368064]

Noise Variance : 0.02383509297344944

卓越飞翔博客
上一篇: 为什么Python中没有goto语句?
下一篇: 返回列表
留言与评论(共有 0 条评论)
   
验证码:
隐藏边栏