Without sounding too smart as if to describe everything of YOLO artitecture here, in this article I would rather show you a lame approach of plugging the custom data set and training a new model in the Google open image datasets. If you’re however to curious to understand it, you could follow the author’s webpage and the articles.
I’m using the dataset provided by google open image , the data looks quite like this:
You can also find the sample file in the project repo.
ImageID,Source,LabelName,Confidence,XMin,XMax,YMin,YMax,IsOccluded,IsTruncated,IsGroupOf,IsDepiction,IsInside,Class,x,y,width,height 078821d86db99fc9.jpg,freeform,/m/014j1m,1,0.244277,0.542485,0.09314,0.539359,1,0,0,0,0,Apple,0.393381,0.3162495,0.29820800000000003,0.44621900000000003 078821d86db99fc9.jpg,freeform,/m/014j1m,1,0.383319,0.6979920000000001,0.474829,0.9430149999999999,1,0,0,0,0,Apple,0.5406555000000001,0.7089219999999998,0.31467300000000004,0.46818599999999994 09a06df60e4bdd62.jpg,freeform,/m/014j1m,1,0.0,0.999692,0.0,0.9931639999999999,0,0,1,0,0,Apple,0.499846,0.49658199999999997,0.999692,0.9931639999999999 0fdea8a716155a8e.jpg,freeform,/m/014j1m,1,0.013999999999999999,0.991891,0.0,0.985716,0,0,1,0,0,Apple,0.5029454999999999,0.492858,0.977891,0.985716 105433f9d808d18e.jpg,freeform,/m/014j1m,1,0.12879300000000002,0.8589479999999999,0.102363,0.854827,0,0,0,0,0,Apple,0.4938705,0.478595,0.7301549999999999,0.752464
We downloaded the whole image dataset worth around 5.x gb using this.
We have the two main csv files, one for training and validation. Our purpose is just to train a model which can detect 12 different fruits available in the google open image and create a bounding box on them. So, we will filter out our desired class from the main csv files and create a smaller csv which contains only those categories in which we’re interested.
import pandas as pd df = pd.read_csv('train_data_description.csv') val_df = pd.read_csv('validation_all.csv') """This is the list of class we're intrested in""" list_of_class = ["Apple", "Banana", "Orange", "Pear", "Mango","Pineapple", "Lemon", "Watermelon", "Strawberry", "Grapefruit", "Peach", "Pomegranate"] def prepareData(dataframe:pd.DataFrame(), list_of_class:list) -> pd.DataFrame() : '''Accepts a dataframe of either train or validate data, and split our desired classes from the original dataset, and create a new dataset for our pre-processing purposes.''' dff= dataframe[dataframe['Class'].isin(list_of_class)] data = dff.drop(dff.index) for cls in list_of_class: count = dff[dff["Class"]==cls] print("for class "+ cls ,count.shape) #We're only taking the 1000 annotation for each category to maintain the categorical dispersity in dataset. data = data.append(dff[dff["Class"]==cls][:1000]) data['ImageID'] = data['ImageID'].apply(lambda a: a+'.jpg') if "Unnamed: 0" in data: data.drop(columns=["Unnamed: 0"], inplace=True) return data train = prepareData(df, list_of_class) val = prepareData(val_df, list_of_class) train.to_csv('train_fruits_1.csv', index=False) val.to_csv('val_fruits_1.csv', index=False)
Converting Annotation Bbox to Yolo Format
So with the train and validation csv generated from the above code, we shall now move on to making the data suitable for the yolo. Yolo doesn’t use the same annotation box as in object detection model like Faster-RCNN provided in tensorflow model zoo. Rather yolo needs
So we have to convert the annotation, which basically is
Ymax from our new csvs to something like:
<class_number> (<absolute_x> / <image_width>) (<absolute_y> / <image_height>) (<absolute_width> / <image_width>) (<absolute_height> / <image_height>)
The following codes does the calculation and gives us the required values:
# The following code is the modified version of codes available here: # https://blog.goodaudience.com/part-1-preparing-data-before-training-yolo-v2-and-v3-deepfashion-dataset-3122cd7dd884 def convert_labels(path, x1, y1, x2, y2): """ Definition: Parses label files to extract label and bounding box coordinates. Converts (Xmin, Ymin, Xmax, Ymax) annotation format to (x, y, width, height) normalized YOLO format. """ def sorting(l1, l2): if l1 > l2: lmax, lmin = l1, l2 return lmax, lmin else: lmax, lmin = l2, l1 return lmax, lmin size = get_img_shape(path) if(size==None): print(path, "not found") return '', '', '', '' xmax, xmin = sorting(x1, x2) ymax, ymin = sorting(y1, y2) dw = 1./size dh = 1./size x = (x1 + x2)/2.0 y = (y1 + y2)/2.0 w = x2 - x1 h = y2 - y1 x = x*dw w = w*dw y = y*dh h = h*dh return (x/dw,y/dh,w/dw,h/dh) def get_img_shape(path): path = path img = cv2.imread(path) try: return img.shape except AttributeError: print('error! ', path) return (None, None, None)
Let’s append the new columns in the existing training and validation files which consists the annotation.
train['x'], train['y'], train['width'], train['height'] = \ zip(*train.progress_apply(lambda row: convert_labels("test_yolo/eval/"+row['ImageID'], row['XMin'], row['YMin'], row['XMax'], row['YMax']), axis=1)) # Like python for one lone code. train.to_csv('test_yolo/fruits_1.csv', index=False) val['x'], val['y'], val['width'], val['height'] = \ zip(*val.progress_apply(lambda row: convert_labels("test_yolo/eval/"+row['ImageID'], row['XMin'], row['YMin'], row['XMax'], row['YMax']), axis=1)) # Like python for one lone code. val.to_csv('test_yolo/fruits_val_1.csv', index=False)
Let Us Now Test The Conversion:
import matplotlib.pyplot as plt def from_yolo_to_cor(box, shape): img_h, img_w, _ = shape im_w = 1./img_w im_h = 1./img_h #box = [b/1000 for b in box] # x1, y1 = ((x + witdth)/2)*img_width, ((y + height)/2)*img_height # x2, y2 = ((x - witdth)/2)*img_width, ((y - height)/2)*img_height x1, y1 = int((box + box/2)*img_w), int((box + box/2)*img_h) x2, y2 = int((box - box/2)*img_w), int((box - box/2)*img_h) return abs(x1), abs(y1), abs(x2), abs(y2) def draw_boxes(img, boxes): shape = np.array([768, 1024, 3]) for box in boxes: x1, y1, x2, y2 = from_yolo_to_cor(box, shape) print(x1, y1, x2, y2) cv2.rectangle(img, (x2, y2), (x1, y1), (0,255,0), 3) #cv2.rectangle(img, (int(0.311875*768), int(0.591597*768) ), (int(0.461875*1024),int(0.768908*1024)) ,(0,255,0), 3) plt.imshow(img) plt.plot() plt.show() imgs = train[train["ImageID"]=="000d9c59687b509b.jpg"] #This is the list of values associated with that image id. Here the code can be improved. boxes = [[0.189062,0.189584,0.378125,0.379167],[0.57625,0.622084,0.5925,0.485833],[0.36625,0.509583,0.03875,0.0525]] #boxes = [[0.73, 0.507380073800738, 0.395, 0.5940959409594095]] draw_boxes(cv2.imread("test_yolo/train/000d9c59687b509b.jpg"), boxes)
The output should look like this:
The converted annotation box:
The original annotation:
img = train[train["ImageID"]=="000d9c59687b509b.jpg"] original = [[0.000000,0.378125,0.000000,0.379167],[0.280000,0.872500,0.379167,0.865000],[0.346875,0.385625,0.483333,0.535833]] draw_boxes(cv2.imread("test_yolo/train/000d9c59687b509b.jpg"), original)
Sorry, I was lazy to automate the values placed while testing the conversion, you can select the any values from the dataframe and put that to visualize the result. For me, I just copied one row and paste that there as a list of list.
Class to Integer Lablelling, An example of bad code : Now let us convert the class label to the interger, since Yolo requires the class to be represented as the interger.
def class_text_to_int(row_label): if row_label == 'Apple': return 1 elif row_label == 'Banana': return 2 elif row_label == 'Orange': return 3 elif row_label == 'Pear': return 4 elif row_label == 'Mango': return 5 elif row_label == 'Pineapple': return 6 elif row_label == 'Lemon': return 7 elif row_label == 'Watermelon': return 8 elif row_label == 'Strawberry': return 9 elif row_label == 'Grapefruit': return 10 elif row_label == 'Peach': return 11 elif row_label == 'Pomegranate': return 12 train["ClassInt"] = train["Class"].progress_apply(lambda row: class_text_to_int(row)) val["ClassInt"] = val["Class"].progress_apply(lambda row: class_text_to_int(row))
Create txt file for each image containing ClassId,centerX, centerY, Width, Height
#loop through the dataframe and create the related txt file for each imageset. path = "PATH_TO_YOUR_FOLDER" def createTxt(path:str, data:pd.DataFrame()): for row in data.iterrows(): row = row towrite = str(row["ClassInt"]) + " " + str(row["x"]) + " "+ str(row["y"]) + " "+ str(row["width"]) + " " + str(row["height"]) with open(path+row["ImageID"][:-4]+".txt", "w") as wr: wr.write(towrite) createTxt('train_data_path', train) createTxt('val_data_path', val)
Getting back to the Yolo
Clone the darknet repo in your local machine.
git clone https://github.com/AlexeyAB/darknet.git
Build the darknet, following the instruction:\
cd darknet # Open up the Makefile and made the necessary edits, Change the GPU 0 to 1 if you're building darknet for GPU, also if you have installed openCV set OPENCV 0 to 1.) Mind that for the images to be displayed you need to install the OpenCV in your machine and build darknet. make # ./darknet should let you access your darknet installation.
Download the Yolo Weight File
Test the installation by running the detector:
./darknet detect cfg/yolo.cfg yolo.weights data/dog.jpg
Aslo please mind the path of files and the installation. At this point, if everything is working fine, let us create the new configuration files for our new training.
Lets create a fruits.cfg file inside the
cfg folder of darknet installation. The sample
.cfg file for our purpose is:
[net] # Testing #batch=1 #subdivisions=1 # Training batch=64 subdivisions=8 width=416 height=416 channels=3 momentum=0.9 decay=0.0005 angle=0 saturation = 1.5 exposure = 1.5 hue=.1 learning_rate=0.001 burn_in=1000 max_batches = 500200 policy=steps steps=400000,450000 scales=.1,.1 [convolutional] batch_normalize=1 filters=32 size=3 stride=1 pad=1 activation=leaky [maxpool] size=2 stride=2 [convolutional] batch_normalize=1 filters=64 size=3 stride=1 pad=1 activation=leaky [maxpool] size=2 stride=2 [convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=leaky [convolutional] batch_normalize=1 filters=64 size=1 stride=1 pad=1 activation=leaky [convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=leaky [maxpool] size=2 stride=2 [convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky [convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky [convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky [maxpool] size=2 stride=2 [convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky [convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky [convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky [convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky [convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky [maxpool] size=2 stride=2 [convolutional] batch_normalize=1 filters=1024 size=3 stride=1 pad=1 activation=leaky [convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky [convolutional] batch_normalize=1 filters=1024 size=3 stride=1 pad=1 activation=leaky [convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky [convolutional] batch_normalize=1 filters=1024 size=3 stride=1 pad=1 activation=leaky ####### [convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1024 activation=leaky [convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1024 activation=leaky [route] layers=-9 [convolutional] batch_normalize=1 size=1 stride=1 pad=1 filters=64 activation=leaky [reorg] stride=2 [route] layers=-1,-4 [convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1024 activation=leaky [convolutional] size=1 stride=1 pad=1 filters=85 activation=linear [region] anchors = 0.57273, 0.677385, 1.87446, 2.06253, 3.33843, 5.47434, 7.88282, 3.52778, 9.77052, 9.16828 bias_match=1 classes=12 coords=4 num=5 softmax=1 jitter=.3 rescore=1 object_scale=5 noobject_scale=1 class_scale=1 coord_scale=1 absolute=1 thresh = .6 random=1
We changed the number of classes, and the value of filters in the last convolution layer. They are the necessary steps.
filters=(classes + 5)*5 in our case filters= (12+5)*5
These parameters are described in the referenced url
Now let us have the classes names in a seperate file called
fruits.names in the same
cfg directory. The following classes represents our new data:
Apple Banana Orange Pear Mango Pineapple Lemon Watermelon Strawberry Grapefruit Peach Pomegranate
Also let’s create the
fruits.data to point the configuration files,
classes=12 train=/home/vaghawan/test_yolo/train.txt valid=/home/vaghawan/test_yolo/test.txt names=/var/www/darknet/cfg/fruits.names backup=backup/
For train and valid, the path should point to the correct directory where we saved the .txt file for each image with the correct image path.
Finally, it’s time to run the training:
./darknet detector train cfg/fruits.data cfg/yolo-obj.cfg darknet19_448.conv.23
Here, you need to make sure you’re giving the correct path of the weight file. If everything went Ok, you will see the new weight file saved in the
backup directory. You can test the prediction with the backup.weight file:
./darknet detector test cfg/fruits.data cfg/fruits.cfg backup/backup.weights data/mango.jpg
The output should look like this:
The main purpose of this article was to show how you can convert the annotated data available in google open image to the yolo format and use them to train the yolo model. Let me know if you encountered any error throughout the process.