tensorflow python 关系_python & tensorflow & keras 总结（一）

107 阅读 0 评论 71 点赞

我是靠谱客的博主飞快大叔，最近开发中收集的这篇文章主要介绍tensorflow python 关系_python & tensorflow & keras 总结（一），觉得挺不错的，现在分享给大家，希望可以做个参考。

概述

每天不断完善自己的认知是件让人很愉悦的事情！！！MOMO：python & tensorflow & keras 总结(一)zhuanlan.zhihu.comMOMO：python & tensorflow & keras 总结(二)zhuanlan.zhihu.comMOMO：python & tensorflow & keras 总结(三)zhuanlan.zhihu.com

50、itertuples()对dataframe遍历，将dataframe迭代为元组.

import pandas as pd

list = [{'c1':10, 'c2':100}, {'c1':11, 'c2':110}, {'c1':12, 'c2':123}]

输出：[{'c1': 10, 'c2': 100}, {'c1': 11, 'c2': 110}, {'c1': 12, 'c2': 123}]

df = pd.DataFrame(list)

输出：

c1 c2

0 10 100

1 11 110

for row in df.itertuples():

print(row)

输出：

Pandas(Index=0, c1=10, c2=100)

Pandas(Index=1, c1=11, c2=110)

Pandas(Index=2, c1=12, c2=123)

for row in df.itertuples():

print(getattr(row, 'c1'), getattr(row, 'c2'))

输出：

10 100

11 110

12 123

for row in df.itertuples():

print(row.c1)

输出：

for row in df.itertuples():

print(row.c2)

输出：

100

110

123

49、iteritems()对dataframe遍历，将dataframe迭代为(列的名称, series)对.

import pandas as pd

list = [{'c1':10, 'c2':100}, {'c1':11, 'c2':110}, {'c1':12, 'c2':123}]

输出：[{'c1': 10, 'c2': 100}, {'c1': 11, 'c2': 110}, {'c1': 12, 'c2': 123}]

df = pd.DataFrame(list)

输出：

c1 c2

0 10 100

1 11 110

for index,column in df.iteritems():

print(index)

输出：

for index,column in df.iteritems():

print(column)

输出：

0 10

1 11

2 12

Name: c1, dtype: int64

0 100

1 110

2 123

Name: c2, dtype: int64

48、iterrows()对dataframe遍历，将dataframe迭代为(行的index, series)对.

import pandas as pd

list = [{'c1':10, 'c2':100}, {'c1':11, 'c2':110}, {'c1':12, 'c2':123}]

输出：[{'c1': 10, 'c2': 100}, {'c1': 11, 'c2': 110}, {'c1': 12, 'c2': 123}]

df = pd.DataFrame(list)

输出：

c1 c2

0 10 100

1 11 110

for index,row in df.iterrows():

print(index)

输出：

for index,row in df.iterrows():

print(row)

输出：

c1 10

c2 100

Name: 0, dtype: int64

c1 11

c2 110

Name: 1, dtype: int64

c1 12

c2 123

Name: 2, dtype: int64

for index,row in df.iterrows():

print(row['c1'], row['c2'])

输出：

10 100

11 110

12 123

47、求两个集合的交集

list1=[1,2,3,4,5,6,7]

list2=[4,5,6,7,8,9,10]

list3 = list(set(list1).intersection(set(list2)))

输出：[4, 5, 6, 7]

len_list3=len(list3)

输出：4

46、pandas去重

data1 = pd.DataFrame({'A':['a','a','a','a'],'B':[1,1,2,2]})

输出：

A B

0 a 1

1 a 1

2 a 2

3 a 2

data2 = pd.DataFrame({'A':['b','b','b','b'],'B':[3,3,4,4]})

输出：

A B

0 b 3

1 b 3

2 b 4

3 b 4

data_concat=pd.concat([data1,data2],axis=0)

输出：

A B

0 a 1

1 a 1

2 a 2

3 a 2

0 b 3

1 b 3

2 b 4

3 b 4

# subset=["A", "B"]和subset=None一样，表示考虑所有列，将所有列对应值相同的行去重。

data_concat_drop1=data_concat.drop_duplicates(subset=["A", "B"],inplace=False)

输出data_concat_drop1：

A B

0 a 1

2 a 2

0 b 3

2 b 4

data_concat_drop1=data_concat.drop_duplicates(subset=None,inplace=False)

输出：

A B

0 a 1

2 a 2

0 b 3

2 b 4

# 将A列对应值相同的行去重， keep='first'表示保留第一次出现的重复行。

data_concat_drop2=data_concat_drop1.drop_duplicates(subset=["A"], keep='first', inplace=False) A B

输出data_concat_drop2：

A B

0 a 1

0 b 3

# 将A列对应值相同的行去重， keep='first'表示保留最后一次出现的重复行。

data_concat_drop3=data_concat_drop1.drop_duplicates(subset=["A"], keep='last', inplace=False)

输出data_concat_drop3：

A B

2 a 2

2 b 4

# 将A列对应值相同的行去重， keep=False表示去除所有重复行。

data_concat_drop4=data_concat_drop1.drop_duplicates(subset=["A"], keep=False, inplace=False)

输出data_concat_drop4：

Empty DataFrame

Columns: [A, B]

Index: []

# inplace=True表示直接在原来的DataFrame上删除重复项，而默认值False表示生成一个副本。

data_concat_drop1.drop_duplicates(subset=["A"], keep='first', inplace=True)

输出data_concat_drop1：

A B

0 a 1

0 b 3

45、pd.concat()

import pandas as pd

import numpy as np

df1 = pd.DataFrame(np.random.randint(0, 5, (2, 3)))

输出：

0 1 2

0 2 4 3

1 4 3 1

df2 = pd.DataFrame(np.random.randint(5, 10, (2, 3)))

输出：

0 1 2

0 9 9 8

1 5 9 8

df_concat_0 = pd.concat([df1, df2],axis=0) # 横向连接

输出：

0 1 2

0 2 4 3

1 4 3 1

0 9 9 8

1 5 9 8

df_concat_1 = pd.concat([df1, df2],axis=1) # 纵向连接

输出：

0 1 2 0 1 2

0 2 4 3 9 9 8

1 4 3 1 5 9 8

44、tqdm的参数：desc和total

import time

from tqdm import tqdm

for i in tqdm(range(100)):

time.sleep(0.01)

输出：100%|██████████| 100/100 [00:01<00:00, 99.14it/s]

for i in tqdm(range(100),desc="desc test"):

time.sleep(0.01)

输出：desc test: 100%|██████████| 100/100 [00:01<00:00, 96.74it/s]

for i in tqdm(range(100),desc="total test",total=len(range(100))):

time.sleep(0.01)

输出：total test: 100%|██████████| 100/100 [00:01<00:00, 98.64it/s]

43、在pandas中使用tqdm

import pandas as pd

import numpy as np

from tqdm import tqdm

df = pd.DataFrame(np.random.randint(0, 10, (500, 20)))

输出1000行6列[0,99]范围的整数：

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

0 2 8 1 1 7 5 1 0 2 3 3 4 1 2 5 6 4 4 7 4

1 0 6 3 4 8 1 4 5 5 1 4 3 2 8 9 5 9 6 6 6

2 0 5 7 8 2 2 0 7 2 8 2 0 6 9 9 3 0 7 8 3

3 7 1 1 6 5 3 1 7 9 2 5 1 2 5 0 5 1 7 1 7

4 0 4 5 7 3 0 3 5 9 0 1 1 9 6 8 2 3 9 7 4

.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..

495 9 5 7 4 9 5 6 5 5 2 5 0 1 3 7 3 8 8 9 4

496 9 2 5 2 4 4 3 7 9 5 4 3 0 1 3 1 8 6 1 5

497 1 9 8 9 0 9 9 2 6 7 7 6 3 7 5 5 9 0 9 5

498 3 1 1 5 3 6 8 6 7 0 4 3 5 4 4 1 0 5 9 4

499 9 2 9 3 2 4 7 6 0 3 9 7 5 4 2 9 4 2 5 9

[500 rows x 20 columns]

tqdm.pandas(desc="my bar!")

df.progress_apply(lambda x: x**2,,axis=0)

输出：

my bar!: 100%|██████████| 20/20 [00:00<00:00, 623.35it/s]

0 1 2 3 4 5 6 7 8 ... 11 12 13 14 15 16 17 18 19

0 4 64 1 1 49 25 1 0 4 ... 16 1 4 25 36 16 16 49 16

1 0 36 9 16 64 1 16 25 25 ... 9 4 64 81 25 81 36 36 36

2 0 25 49 64 4 4 0 49 4 ... 0 36 81 81 9 0 49 64 9

3 49 1 1 36 25 9 1 49 81 ... 1 4 25 0 25 1 49 1 49

4 0 16 25 49 9 0 9 25 81 ... 1 81 36 64 4 9 81 49 16

.. .. .. .. .. .. .. .. .. .. ... .. .. .. .. .. .. .. .. ..

495 81 25 49 16 81 25 36 25 25 ... 0 1 9 49 9 64 64 81 16

496 81 4 25 4 16 16 9 49 81 ... 9 0 1 9 1 64 36 1 25

497 1 81 64 81 0 81 81 4 36 ... 36 9 49 25 25 81 0 81 25

498 9 1 1 25 9 36 64 36 49 ... 9 25 16 16 1 0 25 81 16

499 81 4 81 9 4 16 49 36 0 ... 49 25 16 4 81 16 4 25 81

[500 rows x 20 columns]

df.progress_apply(lambda x: x**2,,axis=1)

输出：

my bar!: 100%|██████████| 500/500 [00:00<00:00, 623.35it/s]

0 1 2 3 4 5 6 7 8 ... 11 12 13 14 15 16 17 18 19

0 4 64 1 1 49 25 1 0 4 ... 16 1 4 25 36 16 16 49 16

1 0 36 9 16 64 1 16 25 25 ... 9 4 64 81 25 81 36 36 36

2 0 25 49 64 4 4 0 49 4 ... 0 36 81 81 9 0 49 64 9

3 49 1 1 36 25 9 1 49 81 ... 1 4 25 0 25 1 49 1 49

4 0 16 25 49 9 0 9 25 81 ... 1 81 36 64 4 9 81 49 16

.. .. .. .. .. .. .. .. .. .. ... .. .. .. .. .. .. .. .. ..

495 81 25 49 16 81 25 36 25 25 ... 0 1 9 49 9 64 64 81 16

496 81 4 25 4 16 16 9 49 81 ... 9 0 1 9 1 64 36 1 25

497 1 81 64 81 0 81 81 4 36 ... 36 9 49 25 25 81 0 81 25

498 9 1 1 25 9 36 64 36 49 ... 9 25 16 16 1 0 25 81 16

499 81 4 81 9 4 16 49 36 0 ... 49 25 16 4 81 16 4 25 81

[500 rows x 20 columns]

42、在迭代器中使用tqdm

from tqdm import tqdm

x=["a", "b", "c", "d"]

x_tqdm = tqdm(x)

输出：0%| | 0/4 [00:00, ?it/s]

import time

for char in x_tqdm :

time.sleep(1)

输出：

25%|██▌ | 1/4 [01:03<03:09, 63.07s/it]

50%|█████ | 2/4 [01:04<01:28, 44.45s/it]

75%|███████▌ | 3/4 [01:05<00:31, 31.42s/it]

100%|██████████| 4/4 [01:06<00:00, 16.52s/it]

for char in tqdm(["a", "b", "c", "d"]):

time.sleep(1)

输出：

0%| | 0/4 [00:00, ?it/s]

25%|██▌ | 1/4 [00:01<00:03, 1.00s/it]

50%|█████ | 2/4 [00:02<00:02, 1.00s/it]

75%|███████▌ | 3/4 [00:03<00:01, 1.00s/it]

100%|██████████| 4/4 [00:04<00:00, 1.00s/it]

for x, i in enumerate(x_tqdm):

print("x=%s,i=%s." % (x, i))

输出：

x=0,i=a.

25%|██▌ | 1/4 [00:14<00:43, 14.55s/it]

x=1,i=b.

50%|█████ | 2/4 [00:15<00:20, 10.48s/it]

x=2,i=c.

75%|███████▌ | 3/4 [00:16<00:07, 7.64s/it]

x=3,i=d.

100%|██████████| 4/4 [00:17<00:00, 4.39s/it]

41、pd.read_csv() 和 to_csv()

import pandas as pd

# header指定哪一行作为表头，默认设置为0(即第一行作为表头)，header=None表示没有表头。

trainset = pd.read_csv("E:datatrainset.csv", header=None)

trainset.to_csv("data/testset.csv", header=None)

40、计算auc和logloss

from sklearn.metrics import log_loss, roc_auc_score

auc = roc_auc_score(y, pred)

logloss = log_loss(y, pred)

其中：

y = [[0.]

[0.]

[1.]

[0.]]

pred = [[0.00261149]

[0.01149073]

[0.00294780]

[0.00335302]

[0.00668004]]

输出：

auc = 0.7806122448979591

logloss = 0.092364388479412

39、将一个numpy数组转换为tensor

a=np.arange(8).reshape(2,2,2)

print(a)

输出：

[[[0 1]

[2 3]]

[[4 5]

[6 7]]]

b=tf.convert_to_tensor(a)

print(b)

Tensor("Const:0", shape=(2, 2, 2), dtype=int32)

38、tostring()与fromstring()

(1)将图像数组转化为字符串

a=np.arange(24).reshape(2,3,4)

print(a)

输出：

[[[ 0 1 2 3]

[ 4 5 6 7]

[ 8 9 10 11]]

[[12 13 14 15]

[16 17 18 19]

[20 21 22 23]]]

b=a.tostring()

print(b)

输出：

b'x00x00x00x00x01x00x00x00x02x00x00x00x03x00x00x00x04x00x00x00x05x00x00x00x06x00x00x00x07x00x00x00x08x00x00x00tx00x00x00nx00x00x00x0bx00x00x00x0cx00x00x00rx00x00x00x0ex00x00x00x0fx00x00x00x10x00x00x00x11x00x00x00x12x00x00x00x13x00x00x00x14x00x00x00x15x00x00x00x16x00x00x00x17x00x00x00'

(2)将字符串还原回原来的图像数组

c=np.fromstring(b,np.int32).reshape(2,3,4)

print(c)

[[[ 0 1 2 3]

[ 4 5 6 7]

[ 8 9 10 11]]

[[12 13 14 15]

[16 17 18 19]

[20 21 22 23]]]

37、解析完tfrecords文件后，如何取样本？

step1: 创建获取样本的迭代器，make_one_shot_iterator表示只将数据读取一次，然后抛弃它。

iterator = tf.compat.v1.data.make_one_shot_iterator(dataset)

其中：

dataset =

iterator =

step2: 获取下一个样本

next_element = iterator.get_next()

输出：

next_element = : ((, , ), )

step3: 创建Session并不断获得下一个样本

batchs = []

sess = tf.compat.v1.Session()

# 获得的值直接属于graph的一部分，所以不再需要用feed_dict来喂：

X, y = sess.run(next_element) # batch_size = 2

batchs.append(X) # 按batchs将样本存储进list中，list中的每个元素为一个tuple，如下X所示。若总样本数为100，batch_size为2，则batchs列表中将存储50个tuple,每个tuple包含两个样本。

输出：

sess =

X = : (array([[ 0.1, 0.1, 1.0],

[ 0.1, 0.1, 1.4]]),

array([[ 0.1, 0.1, 0.1, 0.1, 0.1, 0.2],

[ 0.3, 0.4, 0.1, 0.5, 0.6, 0.5]]),

array([[ 5, 1],

[ 2, 1]]))

y = : [[0.]

[1.]]

36、Dataset解析tfrecords文件

步骤1：定义tf.data.TFRecordDataset，读取tfrecords数据。

filenames = ["testset1.tfrecords","testset2.tfrecords"]

dataset = tf.data.TFRecordDataset(filenames)

输出：

dataset =

步骤2：打乱？

batch_size =1024

dataset = dataset.shuffle(buffer_size = 100 * batch_size)

步骤3：prefetch的buffer_size将数据预处理与下游计算重叠，提高性能。

batch_size = 1024

dataset = dataset.prefetch(buffer_size = 10 * batch_size)

输出：

dataset =

步骤4：定义batch size大小

batch_size = 1024

dataset = dataset.batch(batch_size)

输出：

dataset =

步骤5：对每个dataset的每个样本执行解析函数parser_function

dataset = dataset.map(parser_function, num_parallel_calls=cpu_count())

输出：

dataset =

步骤6：无限重复数据集？

dataset = dataset.repeat()

步骤7：创建解析函数

# 定长特征解析格式：tf.io.FixedLenFeature(shape, dtype,default_value)

# shape: 可当reshape用，如果写入的feature用了tostring()，则shape为()。

# dtype：必须是tf.float32、tf.int64、tf.string中的一种。

# default_value：feature值缺失时所指定的值。

def parser_function(serialized):

# 解析固定长度输入要素的配置：

features_config = {"label": tf.io.FixedLenFeature([1], tf.int64)}

features_config["vector"] = tf.io.FixedLenFeature([3], tf.float32)

features_config["matrix"] = tf.io.FixedLenFeature([6], tf.float32)

features_config["matrix_shape"] = tf.io.FixedLenFeature([2], tf.int64)

# 将大量序列化的样本解析成张量字典：

features_tensor = tf.io.parse_example(serialized, features_config)

# 类型转化：

label = tf.cast(features_tensor["label"], tf.float32)

parsed_features = []

vector = tf.cast(features_tensor["vector"], tf.float32)

parsed_features.append(vector)

matrix = tf.cast(features_tensor["matrix"], tf.float32)

parsed_features.append(matrix)

matrix_shape = tf.cast(features_tensor["matrix_shape"], tf.int32)

parsed_features.append(matrix_shape)

final_return = (tuple(parsed_features), label)

return final_return

其中：

1、serialized = Tensor("args_0:0", shape=(?,), dtype=string)，是一个字符型的一维向量。

2、features_config = : {'label': FixedLenFeature(shape=[1], dtype=tf.int64, default_value=None), 'vector': FixedLenFeature(shape=[3], dtype=tf.float32, default_value=None), 'matrix': FixedLenFeature(shape=[6], dtype=tf.float32, default_value=None), 'matrix_shape': FixedLenFeature(shape=[2], dtype=tf.int64, default_value=None)}

3、features_tensor = : {'label': , 'vector': , 'matrix': , 'matrix_shape': }

4、parsed_features = : [, , ]

5、final_return = : ((, , ), )

35、TFRocord存储数据过程

step1：创建一个writer

import tensorflow as tf

writer=tf.io.TFRecordWriter('%s.tfrecord' %'testset')

输出对象：

print('%s.tfrecord' %'testset')

输出：testset.tfrecord

step2：创建tf.train.Feature()

往 xxx.tfrecord 里写数据时，需先定义写入feature的类型，包括int64、float32、string三种数据类型，格式如下：

features['名称'] = tf.train.Feature(int64_list = tf.train.Int64List(value=输入))

features['名称'] = tf.train.Feature(float_list = tf.train.FloatList(value=输入))

features['名称'] = tf.train.Feature(bytes_list=tf.train.BytesList(value=输入))

tensorflow tf.train.Feature类型只接受list类型数据的输入，如果数据类型为矩阵或张量，需要fatten成list，或转成string类型存储，但这两种方式都会导致形状信息丢失，所以应该额外存储shape信息。

# 以下举例说明将某一个样本的所有特征存储为tf.train.Feature()，假设这个样本只有4个feature，它们的数据类型分别为标量，向量，矩阵，张量。

# 创建字典

features={}

# 写入标量，类型为Int64，由于是标量，所以需要加中括号变成list。

features['label'] = tf.train.Feature(int64_list = tf.train.Int64List(value=[label]))

# 写入向量，类型为float，本身就是list，所以没加中括号。

features['vector'] = tf.train.Feature(float_list = tf.train.FloatList(value=vectors))

# 写入矩阵，类型为float，本身是矩阵，一种方法是将矩阵flatten成list。

features['matrix'] = tf.train.Feature(float_list = tf.train.FloatList(value=matrices.reshape(-1)))

# 为了避免矩阵的形状信息(2,3)丢失，需要存储形状信息，后面可转回原形状。

features['matrix_shape'] = tf.train.Feature(int64_list = tf.train.Int64List(value=matrices.shape))

# 写入张量，类型为float，本身是三维张量，另一种方法是转换成字符类型，需加中括号，后面再转回原类型。

features['tensor'] = tf.train.Feature(bytes_list = tf.train.BytesList(value=[tensors.tostring()]))

# 存储丢失的形状信息(2,2,2)。

features['tensor_shape'] = tf.train.Feature(int64_list = tf.train.Int64List(value=tensors.shape))

step3：将所有feature的字典输入tf.train.Features()中(注意区别tf.train.Feature和tf.train.Features，多了一个字母"s")

tf_features = tf.train.Features(feature= features)

step4：将 tf.train.Features() 转化成 tf.train.Example()

tf_example = tf.train.Example(features = tf_features)

step5：序列化该样本

tf_serialized = tf_example.SerializeToString()

step6：写入该样本

writer.write(tf_serialized)

writer.close()

34、TFRocord以字典方式创建数据，TFRocord存储数据的好处：

1、更方便构建图。原来使用placeholder需要每次都feed_dict，而用TFRecord+Dataset可直接把数据读入操作当成图中的一个节点，不用每次都feed。

2、方便跟Estimators对接。

33、print()函数格式化输出

%s,表示格化式一个对象为字符串，s表示string.

%d,表示格化式一个对象为十进制整数，d表示decimal.

%f,表示格化式一个对象为浮点数，f表示float.

filenames=["trainset1.csv","trainset2.csv"]

file="trainset"

print("the number of%sis :%d" %(file,len(filenames)))

输出：the number of trainset is : 2

print("the number of%sis :%f" %(file,len(filenames)))

输出：the number of trainset is : 2.000000

print("the number of%sis :%.2f" %(file,len(filenames)))

输出：the number of trainset is : 2.00

32、glob

用glob可以查找符合特定规则的文件路径名，跟window中的文件搜索差不多。

import glob

filenames = glob.glob('/data/pengpeng/test/image_*.jpg')

输出：

: ['/data/pengpeng/test/image_1.jpg', '/data/pengpeng/test/image_2.jpg', '/data/pengpeng/test/image_3.jpg', '/data/pengpeng/test/image_4.jpg', '/data/pengpeng/test/image_5.jpg']

31、GPU的使用与分配(tf.config)

获取当前主机上特定运算设备类型(GPU或CPU)的列表：

gpus = tf.config.experimental.list_physical_devices(device_type='GPU')

cpus = tf.config.experimental.list_physical_devices(device_type='CPU')

print(gpus, cpus)

输出：

[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'),

PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU'),

PhysicalDevice(name='/physical_device:GPU:2', device_type='GPU'),

PhysicalDevice(name='/physical_device:GPU:3', device_type='GPU')]

[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]

表示该机器上有4块GPU(GPU:0、GPU:1、GPU:2、GPU:3)及一个CPU(CPU:0)可用。

限定当前程序只使用下标为0、1的两块显卡(GPU:0和GPU:1)：

gpus = tf.config.experimental.list_physical_devices(device_type='GPU')

tf.config.experimental.set_visible_devices(devices=gpus[0:2], device_type='GPU')

或者用如下代码也可实现指定程序只在显卡2,3上运行：

import os

os.environ['CUDA_VISIBLE_DEVICES'] = "2,3"

将GPU的显存使用策略设置为“仅在需要时申请显存空间”，如下代码将所有GPU设置为仅在需要时申请显存空间：

gpus = tf.config.experimental.list_physical_devices(device_type='GPU')

for gpu in gpus:

tf.config.experimental.set_memory_growth(device=gpu, True)

30、格式化字符串的函数 str.format()

print("文章标题：{title}n文章路径：{url}".format(title="python & tensorflow & keras 笔记",url="https://zhuanlan.zhihu.com/p/153360838"))

输出：

文章标题：python & tensorflow & keras 笔记

文章路径：https://zhuanlan.zhihu.com/p/153360838

通过字典设置参数：

dict={"title":"《python & tensorflow & keras 笔记》","url":"https://zhuanlan.zhihu.com/p/153360838"}

print("文章标题：{title}n文章路径：{url}".format(**dict))

输出：

文章标题：《python & tensorflow & keras 笔记》

文章路径：https://zhuanlan.zhihu.com/p/153360838

通过列表设置参数：

list=["《python & tensorflow & keras 笔记》","https://zhuanlan.zhihu.com/p/153360838"]

print("文章标题：{0[0]}n文章路径：{0[1]}".format(list))

输出：

文章标题：《python & tensorflow & keras 笔记》

文章路径：https://zhuanlan.zhihu.com/p/153360838

29、将一个字符串分裂成多个字符串组成的列表

strip() 用于移除字符串头尾指定的字符。

split()表示将一个字符串按指定字符分裂成多个字符串组成的列表

image ='1.jsp,2.jsp,3.jsp,4.jsp'

image_list = image.strip(',').split(',')

输出：

['1.jsp', '2.jsp', '3.jsp', '4.jsp']

28、计算两个时间戳之间相差的天数

import time

t1=int(time.mktime(time.strptime("2020-5-21 17:01:53", "%Y-%m-%d%H:%M:%S")))

t2=int(time.mktime(time.strptime("2020-6-19 11:37:48", "%Y-%m-%d%H:%M:%S")))

gap = (1592537868-1590051713) / 60 / 60 / 24

gap_day = round(gap)

输出：

t1 = 1590051713

t2 = 1592537868

gap = 28.774942129629625

gap_day = 29

27、time.strptime()，time.mktime()，time.gmtime()，time.localtime()，time.asctime()，time.time()

time.strptime()根据指定的格式把一个时间字符串解析为时间元组，返回struct_time对象。

import time

time_str = "2020-10-07 11:45:58"

t = time.strptime(time_str, "%Y-%m-%d%H:%M:%S")

输出：

time.struct_time(tm_year=2020, tm_mon=10, tm_mday=7, tm_hour=11, tm_min=45, tm_sec=58, tm_wday=2, tm_yday=281, tm_isdst=-1)

上述输出结果的含义：

int tm_sec; /* 秒 – 取值区间为[0,59] */

int tm_min; /* 分 - 取值区间为[0,59] */

int tm_hour; /* 时 - 取值区间为[0,23] */

int tm_mday; /* 一个月中的日期 - 取值区间为[1,31] */

int tm_mon; /* 月份(从一月开始，0代表一月 - 取值区间为[0,11] */

int tm_year; /* 年份，其值等于实际年份减去1900 */

int tm_wday; /* 星期 – 取值区间为[0,6]，其中0代表星期一，1代表星期二，以此类推 */

int tm_yday; /* 从每年的1月1日开始的天数 – 取值区间为[0,365]，其中0代表1月1日，1代表1月2日，以此类推 */

int tm_isdst; /* 夏令时标识符，实行夏令时的时候，tm_isdst为正。不实行夏令时的时候，tm_isdst为0；不了解情况时，tm_isdst()为负。

time.mktime()函数与gmtime(), localtime()相反，它接收struct_time对象作为参数，返回用秒数来表示时间的浮点数。

import time

time_str = "2020-10-07 11:45:58"

t1 = time.strptime(time_str, "%Y-%m-%d%H:%M:%S")

t2 = time.mktime(t1)

t3 = int(t2)

输出：

t1 = time.struct_time(tm_year=2020, tm_mon=10, tm_mday=7, tm_hour=11, tm_min=45, tm_sec=58, tm_wday=2, tm_yday=281, tm_isdst=-1)

t2 = 1602042358.0

t3 = 1602042358

time.gmtime() 函数将一个时间戳转换为UTC时区(0时区)的struct_time，可选的参数sec表示从1970-1-1以来的秒数。

time.localtime() 函数类似gmtime()，作用是格式化时间戳为本地的时间。如果sec参数未输入，则以当前时间为转换标准。

import time

time_str = "2020-10-07 11:45:58"

t1 = time.strptime(time_str, "%Y-%m-%d%H:%M:%S")

t2 = time.mktime(t1)

t3 = int(t2)

t4 = time.gmtime(t3)

t5 = time.localtime(t3)

t6 = time.localtime()

输出;

t1 = time.struct_time(tm_year=2020, tm_mon=10, tm_mday=7, tm_hour=11, tm_min=45, tm_sec=58, tm_wday=2, tm_yday=281, tm_isdst=-1)

t2 = 1602042358.0

t3 = 1602042358

t4 = time.struct_time(tm_year=2020, tm_mon=10, tm_mday=7, tm_hour=3, tm_min=45, tm_sec=58, tm_wday=2, tm_yday=281, tm_isdst=0)

t5 = time.struct_time(tm_year=2020, tm_mon=10, tm_mday=7, tm_hour=11, tm_min=45, tm_sec=58, tm_wday=2, tm_yday=281, tm_isdst=0)

t6 = time.struct_time(tm_year=2020, tm_mon=8, tm_mday=11, tm_hour=15, tm_min=36, tm_sec=26, tm_wday=1, tm_yday=224, tm_isdst=0)

time.time() 返回当前时间的时间戳(1970纪元后经过的浮点秒数)。

time.asctime()函数接受时间元组并返回一个可读的形式为"Tue Dec 11 18:07:14 2008"(2008年12月11日周二18时07分14秒)的24个字符的字符串。

import time

t1 = time.time()

t2 = time.localtime(t1)

t3 = time.asctime(t2)

输出：

t1 = 1597131680.927205

t2 = time.struct_time(tm_year=2020, tm_mon=8, tm_mday=11, tm_hour=15, tm_min=41, tm_sec=20, tm_wday=1, tm_yday=224, tm_isdst=0)

t3 = 'Tue Aug 11 15:41:20 2020'

计算耗时：

import time

start = time.time()

pred = model.predict(test_set)

end = time.time()

test_time = round(end - start, 2)

print("预测耗时 = {}秒".format(test_time))

输出：

预测耗时 = 56.42秒

26、python、tensorflow、keras之间的关系：

25、tf.reduce_mean()，tf.reduce_sum()，tf.reduce_max()，tf.concat()，concat_fun().

tf.reduce_mean()

tf.reduce_sum()

tf.reduce_max()

tf.concat()

concat_fun([(?,2,10),(?,3,10),(?,1,10)],axis=1) --> (?,5,10)

24、zip() 函数用于将可迭代的对象作为参数，将对象中对应的元素打包成一个个元组，然后返回由这些元组组成的列表。如果各个迭代器的元素个数不一致，则返回列表长度与最短的对象相同，利用 * 号操作符，可以将元组解压为列表。

a = [1,2,3]

b = [4,5,6]

c = [4,5,6,7,8]

# 打包为元组的列表：

zipped = zip(a,b)

输出：[(1, 4), (2, 5), (3, 6)]

# 元素个数与最短的列表一致：

zip(a,c)

输出：[(1, 4), (2, 5), (3, 6)]

# 与 zip 相反，*zipped 可理解为解压，返回二维矩阵式：

zip(*zipped)

输出：[(1, 2, 3), (4, 5, 6)]

23、预定义初始化方法

全零初始化：keras.initializers.Zeros()

全1初始化：keras.initializers.Ones()

初始化为固定值value：keras.initializers.Constant(value=0)

正态分布初始化：keras.initializers.RandomNormal(mean=0.0, stddev=0.05, seed=None))

22、tf.keras.layers.Reshape()

21、tf.keras.layers.RepeatVector()

20、Z = tf.transpose(Z, perm=[0, 2, 1])

例1：x是2*3的2维张量，转置成3*2的2维张量。

x = [[1 2 3]

[4 5 6]]

tf.transpose(x) ==> [[1 4]

[2 5]

[3 6]]

tf.transpose(x, perm=[1, 0]) ==> [[1 4]

[2 5]

[3 6]]

例2：x是2*2*3的3维张量，最后两维转置，即转置成2*3*2的3维张量。

x = [[[1 2 3]

[4 5 6]]

[[7 8 9]

[10 11 12]]]

tf.transpose(x, perm=[0, 2, 1]) ==> [[[1 4]

[2 5]

[3 6]]

[[7 10]

[8 11]

[9 12]]]

19、lambda函数，匿名函数，没有名字的函数， lambda 函数比常规函数更清晰明了，没有使用def关键字以标准方式声明，是一个单行版本的函数。

例1：

# 定义：

sum = lambda arg1, arg2: arg1 + arg2

# 调用：

print ("Value of total : ", sum( 10, 20 ))

print ("Value of total : ", sum( 20, 20 ))

# 结果：

Value of total : 30

Value of total : 40

例2：

lambda x, y : x+y --> at 0x00000243C2CAA9D8>

add = lambda x, y : x+y -->

add(1,1) 输出：2

例3：def关键字方式定义函数

def add(x, y):

return x + y

add(1,1) 输出：2

18、itertools.combinations()

itertools.combinations(iterable, r)表示创建一个迭代器，返回iterable中所有长度为r的子序列，返回的子序列中的项按输入iterable中的顺序排序。

示例：

import itertools

list1 = [1, 3, 4, 5]

list2 = list(itertools.combinations(list1, 2))

print(list2)

输出：

[(1, 3), (1, 4), (1, 5), (3, 4), (3, 5), (4, 5)]

17、platform模块

platform.platform()用于获取操作系统名称及版本号，如Windows-10-10.0.16299-SP0，应用如下：

import platform

if platform.platform()[:3] == "Win":

xxx

elif platform.platform()[:3] == "Lin":

xxx

else：

xxx

platform的其它方法：

# 获取操作系统的位数，如：('64bit', 'WindowsPE')

platform.architecture()

# 获取计算机类型，如：AMD64

platform.machine()

# 获取计算机的网络名称，如：DESK-20180712BN

platform.node()

# 获取计算机处理器信息，如：Intel64 Family 6 Model 78 Stepping 3, GenuineIntel

platform.processor()

# 获取包含上面所有的信息汇总，如：uname_result(system='Windows', node='DESK-20180712BN', release='10', version='10.0.16299', machine='AMD64', processor='Intel64 Family 6 Model 78 Stepping 3, GenuineIntel')

platform.uname()

16、tf.keras.regularizers.l2(0.001)

tf.keras.regularizers.l2(0.001)表示L2正则化，防止过拟合。

15、tf.tensordot(a,b)

tensordot函数用于矩阵相乘，用它的好处在于：当a和b的维度不同时，也可以相乘。

import tensorflow as tf

a1 = tf.ones(shape=[2,3,3])

a2 = tf.ones(shape=[2,2,3])

b = tf.ones(shape=[3,2,6])

例1：取a1的最后1维(即[3])和b的第1个数(即[3])做矩阵乘法，得c的形状为[2,3,2,6].

c = tf.tensordot(a1,b, axes=1)

例1：取a2的最后2维(即[2,3])和b的前2个数(即[3,2])做矩阵乘法，得d的形状为[2,6].

d = tf.tensordot(a2,b, axes=2)

例2：取a2的第1维与b的第1维进行矩阵相乘，相当于[2,3,2]*[2,3,6]，得e的形状为[2,3,3,6].

e = tf.tensordot(a2,b, axes=(1,1))

例3：取a2的第1维和第2维，b的第0维和第1维，进行矩阵乘法，即[2,2*3] * [3*2,6]= [2,6] * [6,6] = [2,6]，得f的形状为[2,6].

f = tf.tensordot(a2,b, axes=((1,2),(0,1)))

14、//

表示取整数部分，如26//5=5.

13、tf.nn.softmax(x, axis)

softmax函数的作用：归一化。

axis：表示执行softmax的维度，默认值为-1，即最后一个维度。

softmax的公式：

softmax使用示例图：

示例：A为如下4*3的张量

A=array([[1., 2., 3.],

[1., 2., 3.],

[1., 2., 3.]], dtype=float32)>

tf.nn.softmax(A, axis=0)的输出结果为：

array([[0.25, 0.25, 0.25],

[0.25, 0.25, 0.25],

[0.25, 0.25, 0.25]], dtype=float32)>

其中0.25=e/(e+e+e+e)

tf.nn.softmax(B, axis=1)的输出结果为：

array([[0.09003057, 0.24472848, 0.66524094],

[0.09003057, 0.24472848, 0.66524094],

[0.09003057, 0.24472848, 0.66524094]], dtype=float32)>

其中0.09003057=e/(e+e^2+e^3)

12、循环切片生成列表

tensor是形状为1024*27*10的张量，对其第2个维度切片，切成27份：

slice_tensor_list = [tf.slice(tensor, [0, i, 0], [-1, 1, -1]) for i in range(27)]

输出：

: [, , ……]

11、获取张量的shape

tensor的形状为(1024，100，10)

tensor_dim=tensor.get_shape().as_list()[1]

输出：100

10、print 换行

print("step1:nstep2:nstep3:")

输出：

step1:

step2:

step3:

9、tf.tain.example()

单行注释：# import tensorflow as tf

多行注释：

"""import tensorflow as tfimport tensorflow as tfimport tensorflow as tf"""

8、seed(1024)

当seed()没有参数时，每次生成的随机数不一样，而当seed()有参数时，每次生成的随机数一样，选择不同的参数生成的随机数也不一样。

numpy.random.seed(1024)：用来控制生成有规律随机数.

tf.set_random_seed(1234)：设置图级随机seed.

7、数据填充：tf.sequence_mask(lengths，maxlen)

举例：

mask_data = tf.sequence_mask(lengths=[2,2,4],maxlen=6)

输出结果：array([[ True, True, False, False, False, False],

[ True, True, False, False, False, False],

[ True, True, True, True, False, False]])

mask_data = tf.sequence_mask(lengths=[2,2,4],maxlen=6,dtype=tf.float32)或

mask_data = tf.cast(tf.sequence_mask(lengths=[2,2,4],maxlen=6),tf.float32)

输出结果：array([[1., 1., 0., 0., 0., 0.],

[1., 1., 0., 0., 0., 0.],

[1., 1., 1., 1., 0., 0.]], dtype=float32)

6、tf.split(需切分的张量，切成几份，在第几个维度上进行切割)

示例：input是维度为(?, 3, 1, 10)的张量

tf.split(input, 3, axis=1)

输出如下列表：

[, , ]

5、张量维度的扩充与删除: tf.expand_dims(input, axis)与 tf.squeeze(input，axis)

(1)增加一维 (?, 800)-->(?, 800, 1)

tf.squeeze(input，axis=2)

(2)删掉维度为1的那个维度 (?, 800, 1) --> (?, 800)

tf.expand_dims(input, axis=2)

4、张量切片：tf.slice(input, begin, size)

示例：input是一个维度为3*2*3的3D张量

input = [[[1, 1, 1], [2, 2, 2]]

, [[3, 3, 3], [4, 4, 4]]

, [[5, 5, 5], [6, 6, 6]]]

(1)tf.slice(input, [1, 0, 0], [1, 1, 3])

==> 输出维度为1*1*3的张量：[[[3, 3, 3]]]

(2)tf.slice(input, [1, 0, 0], [1, 2, 3])

==> 输出维度为1*2*3的张量：[[[3, 3, 3], [4, 4, 4]]]

(3)tf.slice(input, [1, 0, 0], [2, 1, 3])

==> 输出维度为1*1*3的张量：[[[3, 3, 3]], [[5, 5, 5]]]

(4)tf.slice(input, [0, 0, 0], [-1, 1, 3])

==> 输出维度为3*1*3的张量：[[[1, 1, 1]], [[3, 3, 3]], [[5, 5, 5]]]维度为3*2*3的3D张量

3、tf.matmul(tensor1，tensor2) 和 tf.multiply(tensor1，tensor2)

tf.matmul(tensor1，tensor2)表示两个矩阵之间的乘法。

tf.multiply(tensor1，tensor2)表示两个矩阵之间的哈达马积，即对应元素相乘(elementwise multiplication)。

例1(两个3*3的张量做哈达马积)：

import tensorflow as tf

a=[[1,2,3]

,[1,2,3]

,[1,2,3]]

b=[[0,0,2]

,[0,0,2]

,[0,0,2]]

sess = tf.InteractiveSession()

p=tf.multiply(a,b)

print(p.eval())

sess.close()

输出如下张量：

Tensor("Mul:0", shape=(3, 3), dtype=int32)

[[0 0 6]

[0 0 6]

[0 0 6]]

例2(3*2*4的张量和3*2的张量做哈达马积)：

a=[[[1, 1, 1, 1], [2, 2, 2, 1]]

,[[3, 3, 3, 1], [4, 4, 4, 1]]

,[[5, 5, 5, 1], [6, 6, 6, 1]]]

b=[[1, 2]

,[3, 4]

,[5, 6]]

sess = tf.InteractiveSession()

p=tf.multiply(a,tf.expand_dims(b, axis=2))

print(p.eval())

sess.close()

输出如下3*2*4张量(相当于a中的6个向量分别乘以一个权重值)：

[[[ 1 1 1 1],[ 4 4 4 2]]

, [[ 9 9 9 3],[16 16 16 4]]

, [[25 25 25 5],[36 36 36 6]]]

2、tensor_like=tf.ones_like(tensor) 和 tensor_like=tf.zeros_like(tensor)

tensor_like=tf.ones_like(tensor) 表示新建一个与给定tensor的类型和大小一致的tensor，其所有元素为1.

tensor_like=tf.zeros_like(tensor) 表示新建一个与给定tensor的类型和大小一致的tensor，其所有元素为0.

示例：

tensor=[[1, 2, 3], [4, 5, 6]]

tensor_like = tf.ones_like(tensor)

print(sess.run(tensor_like))

输出：

[[1 1 1],

[1 1 1]]

1、arg、*args、**kwargs

args 是 arguments 的缩写，表示可变的位置参数列表。

kwargs 是 keyword arguments 的缩写，表示可变的关键字参数列表。

其实args和kwargs只是变量前面的星号是必须的，后面的变量名我们可以自己定义，args和kwargs只是一个通俗的命名约定而已。

用途：

*args 用来将参数打包成tuple给函数体调用。

**kwargs 打包关键字参数成dict给函数体调用。

注意点：参数arg、*args、**kwargs三个参数的位置必须是一定的。必须是(arg,*args,**kwargs) 这个顺序，否则程序会报错。

示例：

def function(arg,*args,**kwargs):

print(arg,args,kwargs)

function(6,7,8,9,a=1, b=2, c=3)

输出：6 (7，8，9) {'a':1, 'b':2, 'c':3}

*args 和 **kwargs 主要用于函数定义，将不定数量的参数传递给一个函数。这里不定的意思是：预先并不知道, 函数使用者会传递多少个参数, 所以在这个场景下使用这两个关键字。

Python 基础教程链接：Python 基础教程 | 菜鸟教程www.runoob.comKeras是基于Python的深度学习库，可看作是tensorflow封装后的API。Keras中文文档链接如下：https://keras-cn-twkun.readthedocs.io/keras-cn-twkun.readthedocs.ioTensorFlow官方文档：TensorFlow官方文档_w3cschoolwww.w3cschool.cn