夜夜丁香精品高潮,黄色亚州国产电影,国产成人精品无码一区

在大多數(shù)面向初學者的TensorFlow 教程里，作者通常會建議讀者在會話中用feed_dict為模型導入數(shù)據(jù)——feed_dict是一個字典，能為占位符饋送數(shù)據(jù)。但是，其實TF提供了另一種更好的、更簡單的方法：只需使用tf.dataAPI，你就能用幾行代碼搞定高性能數(shù)據(jù)管道。

那么tf.data的優(yōu)勢具體在哪里呢？如下圖所示，雖然feed_dict的靈活性大家有目共睹，但每當我們需要等待CPU把數(shù)據(jù)饋送進來時，GPU就一直處于閑置狀態(tài)，也就是程序運行效率太低。

而tf.data管道沒有這個問題，它能提前抓取下個batch的數(shù)據(jù)，降低總體閑置時間。在這個基礎(chǔ)上，如果我們采用并行數(shù)據(jù)導入，或者事先進行數(shù)據(jù)預處理，整個過程就更快了。

在5分鐘內(nèi)實現(xiàn)小型圖像管道

要構(gòu)建一個簡單數(shù)據(jù)管道，首先我們需要兩個對象：一個用于存儲數(shù)據(jù)集的tf.data.Dataset，以及一個允許我們逐個從數(shù)據(jù)集中提取樣本的tf.data.Iterator。

對于tf.data.Dataset，它在圖像管道中是這樣的：

[

[Tensor(image), Tensor(label)],

...

]

之后我們就可以用tf.data.Iterator逐個檢索圖像標簽對。在實踐中，多個圖像標簽對通常會組成元素序列，方便迭代器進行提取。

至于數(shù)據(jù)集，DatasetAPI有兩種創(chuàng)建數(shù)據(jù)集的方法，其一是從源（如Python中的文件名列表）創(chuàng)建數(shù)據(jù)集，其二是可以直接在現(xiàn)有數(shù)據(jù)集上應用轉(zhuǎn)換，下面是一些示例：

Dataset(list of image files) → Dataset(actual images)

Dataset(6400 images) → Dataset(64 batches with 100 images each)

Dataset(list of audio files) → Dataset(shuffled list of audio files)

定義計算圖

小型圖像管道的大致情況如下圖所示：

所有代碼都和模型、損失、優(yōu)化器等一起放在我們的計算圖定義中。首先，我們要從文件列表中創(chuàng)建一個張量。

# define list of files

files = ['a.png', 'b.png', 'c.png', 'd.png']

# create a dataset from filenames

dataset = tf.data.Dataset.from_tensor_slices(files)

之后是定義一個函數(shù)來從其路徑加載圖像（作為張量），并調(diào)用tf.data.Dataset.map()把函數(shù)用于數(shù)據(jù)集中的所有元素（文件路徑）。如果想并行調(diào)用函數(shù)，你也可以設(shè)置num_parallel_calls=n里的map()參數(shù)。

# Source

def load_image(path):

image_string = tf.read_file(path)

# Don't use tf.image.decode_image, or the output shape will be undefined

image = tf.image.decode_jpeg(image_string, channels=3)

# This will convert to float values in [0, 1]

image = tf.image.convert_image_dtype(image, tf.float32)

image = tf.image.resize_images(image, [image_size, image_size])

return image

# Apply the function load_image to each filename in the dataset

dataset = dataset.map(load_image, num_parallel_calls=8)

然后是用tf.data.Dataset.batch()創(chuàng)建batch：

# Create batches of 64 images each

dataset = dataset.batch(64)

如果想減少GPU閑置時間，我們可以在管道末尾添加tf.data.Dataset.prefetch(buffer_size)，其中buffer_size這個參數(shù)表示預抓取的batch數(shù)，我們一般設(shè)buffer_size=1，但在某些情況下，尤其是處理每個batch耗時不同時，我們也可以適當擴大一點。

dataset = dataset.prefetch(buffer_size=1)

最后，我們再創(chuàng)建一個迭代器遍歷數(shù)據(jù)集。雖然迭代器的選擇有很多，但對于大多數(shù)任務，我們還是建議選擇可以初始化的迭代器。

iterator = dataset.make_initializable_iterator()

調(diào)用tf.data.Iterator.get_next()創(chuàng)建占位符張量，每次評估時，TensorFlow都會填充下一batch的圖像。

batch_of_images = iterator.get_next()

如果寫到這里，你突然想換回feed_dict的方法，你可以用batch_of_images把之前的占位符全都替換掉。

運行會話

現(xiàn)在，我們就可以向往常一樣運行模型了。但在每個epoch前，記得先評估iterator.initializer的op和tf.errors.OutOfRangeError有沒有拋出異常。

with tf.Session() as session:

for i in range(epochs):

session.run(iterator.initializer)

try:

# Go through the entire dataset

whileTrue:

image_batch = session.run(batch_of_images)

except tf.errors.OutOfRangeError:

print('End of Epoch.')

nvidia-smi這個命令可以幫我們監(jiān)控GPU利用率，找到數(shù)據(jù)管道中的瓶頸。正常情況下，GPU的平均利用率應該高于70%-80%。

更完整的數(shù)據(jù)管道

Shuffle

在Dataset里，tf.data.Dataset.shuffle()是一個比較常用的方法，它可以用來打亂數(shù)據(jù)集中的數(shù)據(jù)順序。它的參數(shù)buffer_size指定的是一次打亂的元素數(shù)量，一般情況下，我們建議把這個參數(shù)值設(shè)大一點，最好一次性就能把整個數(shù)據(jù)集洗牌，因為如果參數(shù)過小，它可能會造成意料之外的偏差。

dataset = tf.data.Dataset.from_tensor_slices(files)

dataset = dataset.shuffle(len(files))

數(shù)據(jù)增強

數(shù)據(jù)增強是擴大數(shù)據(jù)集的一種常用方式，這方面常用的函數(shù)有tf.image.random_flip_left_right()、tf.image.random_brightness()和tf.image.random_saturation()：

# Source

def train_preprocess(image):

image = tf.image.random_flip_left_right(image)

image = tf.image.random_brightness(image, max_delta=32.0 / 255.0)

image = tf.image.random_saturation(image, lower=0.5, upper=1.5)

# Make sure the image is still in [0, 1]

image = tf.clip_by_value(image, 0.0, 1.0)

return image

標簽

要想在圖像上加載標簽（或其他元數(shù)據(jù)），我們只需在創(chuàng)建初始數(shù)據(jù)集時就把它們包含在內(nèi)：

# files is a python list of image filenames

# labels is a numpy array with label data for each image

dataset = tf.data.Dataset.from_tensor_slices((files, labels))

確保應用于數(shù)據(jù)集的所有.map()函數(shù)都允許標簽數(shù)據(jù)通過：

def load_image(path, label):

# load image

return image, label

dataset = dataset.map(load_image)

聲明：本文內(nèi)容及配圖由入駐作者撰寫或者入駐合作網(wǎng)站授權(quán)轉(zhuǎn)載。文章觀點僅代表作者本人，不代表電子發(fā)燒友網(wǎng)立場。文章及其配圖僅供工程師學習之用，如有內(nèi)容侵權(quán)或者其他違規(guī)問題，請聯(lián)系本站處理。舉報投訴