概述
30.词云图(Word cloud)
30.1.Example 1: Basic word cloud
31.3D绘图
31.1.在3D图上绘制2D数据
31.2.3D 散点图 (scatterplot)
31.3.3D surface (color map)
32.矩阵可视化(Matshow)
33.绘制混淆矩阵(plot Confusion Matrix)
30.词云图(Word cloud)
词云图作用主要是为了文本数据的视觉表示,由词汇组成类似云的彩色图形。相对柱状图,折线图,饼图等等用来显示数值数据的图表,词云图的独特之处在于,它可以展示大量文本数据。每个词的重要性以字体大小,字体越大,越突出,也越重要。通过词云图,用户可以快速感知最突出的文字,迅速抓住重点。
词云图是对文本中出现频率较高的“关键词”予以视觉化的展现,词云图过滤掉大量的低频低质的文本信息,使得浏览者只要一眼扫过文本就可领略文本的主旨。
需要安装wordcloud模块:pip install wordcloud
30.1.Example 1: Basic word cloud
from wordcloud import WordCloud
import matplotlib.pyplot as plt
text = ('Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from data in various forms, both structured and unstructured, similar to data mining. Data science is a concept to unify statistics, data analysis, machine learning and their related methods in order to understand and analyze actual phenomena with data. It employs techniques and theories drawn from many fields within the context of mathematics, statistics, information science, and computer science. Turing award winner Jim Gray imagined data science as a fourth paradigm of science (empirical, theoretical, computational and now data-driven) and asserted that everything about science is changing because of the impact of information technology and the data deluge. In 2012, when Harvard Business Review called it The Sexiest Job of the 21st Century, the term data science became a buzzword. It is now often used interchangeably with earlier concepts like business analytics, business intelligence, predictive modeling, and statistics. Even the suggestion that data science is sexy was paraphrasing Hans Rosling, featured in a 2011 BBC documentary with the quote, Statistics is now the sexiest subject around. Nate Silver referred to data science as a sexed up term for statistics. In many cases, earlier approaches and solutions are now simply rebranded as data science to be more attractive, which can cause the term to become dilute beyond usefulness. While many university programs now offer a data science degree, there exists no consensus on a definition or suitable curriculum contents. To its discredit, however, many data-science and big-data projects fail to deliver useful results, often as a result of poor management and utilization of resources')
wordcloud = WordCloud(width=1280, height=853, margin=0, colormap='Blues').generate(text)
plt.figure(figsize=(13, 8.6))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.margins(x=0, y=0)
plt.show()
修改尺寸大小等
from wordcloud import WordCloud
import matplotlib.pyplot as plt
# Create a list of word
text = (
"Python Python Python Matplotlib Matplotlib Seaborn Network Plot Violin Chart Pandas Datascience Wordcloud Spider Radar Parrallel Alpha Color Brewer Density Scatter Barplot Barplot Boxplot Violinplot Treemap Stacked Area Chart Chart Visualization Dataviz Donut Pie Time-Series Wordcloud Wordcloud Sankey Bubble")
# Create the wordcloud object
wordcloud = WordCloud(width=480, height=480, margin=0).generate(text)
# Display the generated image:
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.margins(x=0, y=0)
plt.show()
定制词云图:
from wordcloud import WordCloud
import matplotlib.pyplot as plt
# Create a list of word
text = (
"Python Python Python Matplotlib Matplotlib Seaborn Network Plot Violin Chart Pandas Datascience Wordcloud Spider Radar Parrallel Alpha Color Brewer Density Scatter Barplot Barplot Boxplot Violinplot Treemap Stacked Area Chart Chart Visualization Dataviz Donut Pie Time-Series Wordcloud Wordcloud Sankey Bubble")
# Create the wordcloud object
wordcloud = WordCloud(width=480, height=480, max_font_size=20, min_font_size=10).generate(text)
plt.figure()
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.margins(x=0, y=0)
plt.show()
可以设置要在tagcloud上显示的最大单词数。 假设只想显示3个最常用的单词:
from wordcloud import WordCloud
import matplotlib.pyplot as plt
# Create a list of word
text = ("Python Python Python Matplotlib Matplotlib Seaborn Network Plot Violin Chart Pandas Datascience Wordcloud Spider Radar Parrallel Alpha Color Brewer Density Scatter Barplot Barplot Boxplot Violinplot Treemap Stacked Area Chart Chart Visualization Dataviz Donut Pie Time-Series Wordcloud Wordcloud Sankey Bubble")
# Create the wordcloud object
wordcloud = WordCloud(width=480, height=480,max_words=3).generate(text)
plt.figure()
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.margins(x=0, y=0)
plt.show()
更改背景颜色
from wordcloud import WordCloud
import matplotlib.pyplot as plt
# Create a list of word
text = ("Python Python Python Matplotlib Matplotlib Seaborn Network Plot Violin Chart Pandas Datascience Wordcloud Spider Radar Parrallel Alpha Color Brewer Density Scatter Barplot Barplot Boxplot Violinplot Treemap Stacked Area Chart Chart Visualization Dataviz Donut Pie Time-Series Wordcloud Wordcloud Sankey Bubble")
# Create the wordcloud object
wordcloud = WordCloud(width=480, height=480,background_color="skyblue").generate(text)
plt.figure()
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.margins(x=0, y=0)
plt.show()
最后使用调色板更改单词的颜色
from wordcloud import WordCloud
import matplotlib.pyplot as plt
# Create a list of word
text = ("Python Python Python Matplotlib Matplotlib Seaborn Network Plot Violin Chart Pandas Datascience Wordcloud Spider Radar Parrallel Alpha Color Brewer Density Scatter Barplot Barplot Boxplot Violinplot Treemap Stacked Area Chart Chart Visualization Dataviz Donut Pie Time-Series Wordcloud Wordcloud Sankey Bubble")
# Create the wordcloud object
wordcloud = WordCloud(width=480, height=480, colormap="Blues").generate(text)
plt.figure()
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.margins(x=0, y=0)
plt.show()
31.3D绘图
31.1.在3D图上绘制2D数据
演示使用ax.plot的zdir关键字在一个3D图的选择轴上绘制2D数据。
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure()
ax = fig.gca(projection='3d')
# Plot a sin curve using the x and y axes.
x = np.linspace(0, 1, 100)
y = np.sin(x * 2 * np.pi) / 2 + 0.5
# zdir='z'表示在x,y上进行绘制,
ax.plot(x, y, zs=0, zdir='z', label='curve in (x, y)')
# Plot scatterplot data (20 2D points per colour) on the x and z axes.
colors = ('r', 'g', 'b', 'k')
# Fixing random state for reproducibility
np.random.seed(19680801)
x = np.random.sample(20 * len(colors))
y = np.random.sample(20 * len(colors))
c_list = []
for c in colors:
c_list.extend([c] * 20)
# By using zdir='y', the y value of these points is fixed to the zs value 0
# and the (x, y) points are plotted on the x and z axes.
# zdir='y' 之后,在x,z上绘图。画的是点
ax.scatter(x, y, zs=0, zdir='y', c=c_list, label='points in (x, z)')
# Make legend, set axes limits and labels
ax.legend()
ax.set_xlim(0, 1)
ax.set_ylim(0, 1)
ax.set_zlim(0, 1)
ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_zlabel('Z')
# Customize the view angle so it's easier to see that the scatter points lie
# on the plane y=0
ax.view_init(elev=20., azim=-35)
plt.show()
31.2.3D 散点图 (scatterplot)
Demonstration of a basic scatterplot in 3D.
import matplotlib.pyplot as plt
import numpy as np
# Fixing random state for reproducibility
np.random.seed(19680801)
def randrange(n, vmin, vmax):
'''
Helper function to make an array of random numbers having shape (n, )
with each number distributed Uniform(vmin, vmax).
'''
return (vmax - vmin)*np.random.rand(n) + vmin
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
n = 100
# For each set of style and range settings, plot n random points in the box
# defined by x in [23, 32], y in [0, 100], z in [zlow, zhigh].
for m, zlow, zhigh in [('o', -50, -25), ('^', -30, -5)]:
xs = randrange(n, 23, 32)
ys = randrange(n, 0, 100)
zs = randrange(n, zlow, zhigh)
ax.scatter(xs, ys, zs, marker=m)
ax.set_xlabel('X Label')
ax.set_ylabel('Y Label')
ax.set_zlabel('Z Label')
plt.show()
31.3.3D surface (color map)
演示绘制用coolwarm颜色图着色的3D表面。 通过使用antialiased = False使该表面不透明。
还演示了使用LinearLocator和自定义格式定义z轴刻度标签。
import matplotlib.pyplot as plt
from matplotlib import cm
from matplotlib.ticker import LinearLocator, FormatStrFormatter
import numpy as np
fig = plt.figure()
ax = fig.gca(projection='3d')
# Make data.
X = np.arange(-5, 5, 0.25)
Y = np.arange(-5, 5, 0.25)
X, Y = np.meshgrid(X, Y)
R = np.sqrt(X**2 + Y**2)
Z = np.sin(R)
# Plot the surface.
surf = ax.plot_surface(X, Y, Z, cmap=cm.coolwarm,
linewidth=0, antialiased=False)
# Customize the z axis.
ax.set_zlim(-1.01, 1.01)
ax.zaxis.set_major_locator(LinearLocator(10))
ax.zaxis.set_major_formatter(FormatStrFormatter('%.02f'))
# Add a color bar which maps values to colors.
fig.colorbar(surf, shrink=0.5, aspect=5)
plt.show()
32.矩阵可视化(Matshow)
Simple matshow example.
import matplotlib.pyplot as plt
import numpy as np
def samplemat(dims):
"""Make a matrix with all zeros and increasing elements on the diagonal"""
aa = np.zeros(dims)
for i in range(min(dims)):
aa[i, i] = i
return aa
# Display matrix
plt.matshow(samplemat((15, 15)))
plt.show()
import numpy as np
import matplotlib.pyplot as plt
alphabets = ['A', 'B', 'C', 'D', 'E']
# randomly generated array
random_array = np.random.random((5, 5))
figure = plt.figure()
axes = figure.add_subplot(111)
# using the matshow() function
caxes = axes.matshow(random_array, interpolation='nearest')
figure.colorbar(caxes)
axes.set_xticklabels([''] + alphabets)
axes.set_yticklabels([''] + alphabets)
plt.show()
33.绘制混淆矩阵(plot Confusion Matrix)
import numpy as np
import matplotlib.pyplot as plt
import numpy as np
import itertools
def plot_confusion_matrix(cm,
target_names,
title='Confusion matrix',
cmap=None,
normalize=True):
"""
given a sklearn confusion matrix (cm), make a nice plot
Arguments
---------
cm:
confusion matrix from sklearn.metrics.confusion_matrix
target_names: given classification classes such as [0, 1, 2]
the class names, for example: ['high', 'medium', 'low']
title:
the text to display at the top of the matrix
cmap:
the gradient of the values displayed from matplotlib.pyplot.cm
see http://matplotlib.org/examples/color/colormaps_reference.html
plt.get_cmap('jet') or plt.cm.Blues
normalize:
If False, plot the raw numbers
If True, plot the proportions
Usage
-----
plot_confusion_matrix(cm
= cm,
# confusion matrix created by
# sklearn.metrics.confusion_matrix
normalize
= True,
# show proportions
target_names = y_labels_vals,
# list of names of the classes
title
= best_estimator_name) # title of graph
Citiation
---------
http://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html
"""
accuracy = np.trace(cm) / float(np.sum(cm))
misclass = 1 - accuracy
accuracy = np.trace(cm) / float(np.sum(cm))
misclass = 1 - accuracy
if cmap is None:
cmap = plt.get_cmap('Blues')
plt.figure(figsize=(8, 6))
plt.imshow(cm, interpolation='nearest', cmap=cmap)
plt.title(title)
plt.colorbar()
if target_names is not None:
tick_marks = np.arange(len(target_names))
plt.xticks(tick_marks, target_names, rotation=45)
plt.yticks(tick_marks, target_names)
if normalize:
cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
thresh = cm.max() / 1.5 if normalize else cm.max() / 2
for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
if normalize:
plt.text(j, i, "{:0.4f}".format(cm[i, j]),
horizontalalignment="center",
color="white" if cm[i, j] > thresh else "black")
else:
plt.text(j, i, "{:,}".format(cm[i, j]),
horizontalalignment="center",
color="white" if cm[i, j] > thresh else "black")
plt.tight_layout()
plt.ylabel('True label')
plt.xlabel('Predicted labelnaccuracy={:0.4f}; misclass={:0.4f}'.format(accuracy, misclass))
plt.show()
plot_confusion_matrix(cm=np.array([[1098, 1934, 807],
[604, 4392, 6233],
[162, 2362, 31760]]),
normalize=False,
target_names=['high', 'medium', 'low'],
title="Confusion Matrix")
最后
以上就是笨笨龙猫为你收集整理的30.32.33.词云图、3D绘图、矩阵可视化、绘制混淆矩阵30.词云图(Word cloud)31.3D绘图32.矩阵可视化(Matshow)33.绘制混淆矩阵(plot Confusion Matrix)的全部内容,希望文章能够帮你解决30.32.33.词云图、3D绘图、矩阵可视化、绘制混淆矩阵30.词云图(Word cloud)31.3D绘图32.矩阵可视化(Matshow)33.绘制混淆矩阵(plot Confusion Matrix)所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
发表评论 取消回复