博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
我第一次使用matplotlib
阅读量:2523 次
发布时间:2019-05-11

本文共 4661 字,大约阅读时间需要 15 分钟。

I was interested in learning a little bit more about data science and machine learning algorithms. And one of the most used data sets to introduce one to the topic is . The iris data set contains 150 instances with a classifier describing which kind of iris plant type it is: iris setosa, iris versicolor and iris virginica. We have 50 instances of each class in the data set. Each instance describes the plants sepal length, sepal width, petal length and petal width. And now only given the information of these 4 values, a classifier should be able to accurately predict what kind of plant type the instance is.

我有兴趣学习更多有关数据科学和机器学习算法的知识。 是向主题介绍一种最常用的数据集。 虹膜数据集包含150个实例,该实例带有分类器,该分类器描述了它是哪种虹膜植物类型:鸢尾虹膜,杂色鸢尾和初春鸢尾。 数据集中每个类有50个实例。 每个实例都描述了植物的萼片长度,萼片宽度,花瓣长度和花瓣宽度。 现在,仅给出这四个值的信息,分类器就应该能够准确地预测该实例是哪种植物类型。

Here is the iris data set I have used for the following plot. It only differs from the data set in the UC Irvine Machine Learning Repository by an additional line in the CSV file describing what each column signifies:

这是我用于下图的虹膜数据集。 它与UC Irvine机器学习存储库中的数据集的区别仅在于CSV文件中的另一行描述了每一列所表示的含义:

So this is how the beginning of the csv file will look like to you, when you open it in a text editor:

因此,这是在文本编辑器中打开csv文件开头时的样子:

sepal-length,sepal-width,petal-length,petal-width,classification5.1,3.5,1.4,0.2,Iris-setosa4.9,3.0,1.4,0.2,Iris-setosa4.7,3.2,1.3,0.2,Iris-setosa⋮ , ⋮ , ⋮ , ⋮ , ⋮

Matplotlib (Matplotlib)

I decided to use the iris data set to learn more on how to use . Matplotlib is an open source project, which enables you to create 2D plots in python. You can create a lot of different kinds of charts, graphs and even animations. This is very useful, since a good visualization of a given data set can help you recognize, which feature in the instance has a high relevance to determine the classification, just by using your “naked eye”.

我决定使用iris数据集来学习有关如何使用 。 Matplotlib是一个开源项目,使您可以在python中创建2D图。 您可以创建许多不同种类的图表,图形甚至动画。 这非常有用,因为对给定数据集的良好可视化可以帮助您识别出实例中哪个功能与确定分类具有高度相关性,只需使用“裸眼”即可。

Luckly, matplotlib offers an extensive library of  on their website with additional images or animations the code would produce, which made it easy for me to find, what I was looking for. And this was what I was able to produce just in my first hour of using matplotlib:

幸运的是,matplotlib在其网站上提供了一个广泛的库,其中包含代码会生成的其他图像或动画,这使我很容易找到所需的内容。 这就是我在使用matplotlib的第一个小时中就能产生的结果:

Scatter Graph of the Iris data set

Scatter Graph of the Iris data set

虹膜数据集的散点图

Here’s also the complete code:

这也是完整的代码:

import numpy as npimport matplotlib.patches as mpatchesimport matplotlib.pyplot as plt""" 	Desc: 	1. sepal-length			2. sepal-width			3. petal-length			4. petal-width			5. classification 				- Iris-setosa				- Iris-versicolor				- Iris-virginica"""dataset = np.genfromtxt('../iris.csv', delimiter=',', dtype=None)[1:]lengths = [x[0:4] for x in dataset]flower_type = [x[4] for x in dataset]for i in range(0,len(lengths)-1):	x, y = [lengths[i][0], lengths[i][1]]	scale = 100.0	# determine color	flower = flower_type[i].decode("utf-8")	color = ""	if flower == "Iris-setosa":		color = "red"	elif flower == "Iris-versicolor":		color = "green"	elif flower == "Iris-virginica":		color = "blue"	plt.scatter(x, y, s=scale, c=color, alpha=1, edgecolor="none")# Legendred_patch = mpatches.Patch(color='red', label='iris setosa')green_patch = mpatches.Patch(color='green', label='iris versicolor')blue_patch = mpatches.Patch(color='blue', label='iris virginica')plt.legend(handles=[red_patch, green_patch, blue_patch])plt.title("The Iris Data Set", fontsize=18)plt.xlabel(r'sepal length', fontsize=15)plt.ylabel(r'sepal width', fontsize=15)plt.legend()plt.grid(True)plt.show()

I also used the module for the genfromtxt() function, which offers a easy way to read info from a csv file. Notice that I skipped the first line of the csv file, since that line only offers a description of the columns. The rest of the code is pretty self-explaining and not too complicated.

我还对genfromtxt()函数使用了模块,该模块提供了一种从csv文件读取信息的简便方法。 注意,我跳过了csv文件的第一行,因为该行仅提供各列的描述。 其余代码很容易解释,也不太复杂。

We can see, that there two clusters in the scatter plot. Almost all instances except a few outliers of the iris setosa test cluster around the top-left of the graph. The other instances of the iris versicolor and the iris vriginica (respectively green and blue) are mixed up in one bigger cluster. This shows us, that the sepal width and sepal length are good features to determine if the instance is of type iris setosa or not, but wouldn’t perform very well in differentiating between iris versicolor and iris virginica.

我们可以看到,散点图中有两个簇。 除了虹膜setosa测试的一些异常值外,几乎所有实例都围绕图的左上方。 鸢尾花和鸢尾花的其他实例(分别是绿色和蓝色)混合在一个较大的簇中。 这向我们表明,萼片的宽度和萼片的长度是确定实例是否为鸢尾花类型的良好特征,但是在区分鸢尾花和初生鸢尾花时效果不佳。

翻译自:

转载地址:http://ovqwd.baihongyu.com/

你可能感兴趣的文章
HDU2519:新生晚会
查看>>
第二周作业
查看>>
2019学期第四周编程总结 .
查看>>
进程和线程区别
查看>>
微信小程序小技巧系列《二》show内容展示,上传文件编码问题
查看>>
动态样式语言Sass&Less介绍与区别
查看>>
开心菜鸟系列----函数作用域(javascript入门篇)
查看>>
详解 UIView 的 Tint Color 属性
查看>>
仿真和计算作业
查看>>
微软面试题答案
查看>>
WebService - 创建
查看>>
第一章《人造与天生》
查看>>
centos7 install rabbtimq
查看>>
hdu 1002 A+B Problem 2
查看>>
消息队列五
查看>>
Ubuntu 14.04 64bit下Caffe + Cuda6.5/Cuda7.0 安装配置教程
查看>>
js中期知识点总结11月2日
查看>>
20150716 DAY5
查看>>
【C语言及程序设计】生成随机数
查看>>
学习新语言等技能的历程
查看>>