Vision Setup and Basics Workshop

Getting Started with Vision

Agenda

Please fill out this attendance form. Completion of this form is to be used to display the demographics of the club as well as record attendance.

This is the agenda for the Vision Setup and Basics workshop. The workshop date is October 23, 2018. Meeting lasts from 6:45 PM - 9:00 PM. Please note that the minutes are approximations, so expect each concept to have an additional ±10 minutes.

  • Development Setup (10 mins)
  • Vision Pipeline Overview (10 mins)
  • Synthetic Target Generation (5 mins)
  • Subteam Interest Form (5 mins)
  • Reminders
  • Free Time (70 mins)

Development Setup

We use Docker to deploy the vision pipeline on different platforms, but installing Python 3 and requisite machine learning packages are recommended if you want to develop for the Vision Subteam.

Python

Python Virtual Environments

It is recommended that you install the vision system dependencies with pip inside of a python virtual environment instead of globally installing packages. This isolates the packages required by the UAS vision system and ensures that updates and changes to your local system will not break compatibility.

Using a virtualenv

Once you follow the instructions bellow to create a virtual environment you can enter it by running:

source <venv_dir>/bin/activate

And leave the environement by running:

deactivate

When you are within a virtualenv you may notice that python is aliased to the version of python you created the virtualenv with. Additionally you may notice that when you run pip list the listed packages are not the same as the ones that you may have installed on your system. When you run pip install the packages are not installed globally on your system. Instead they are saved in the virtualenv directory that you created.

Windows (WSL with Ubuntu)

  1. Open a terminal and run:
sudo apt-get update
sudo apt-get install python3-venv
  1. Navigate to the drone_code folder and create a new virtualenv:
python3.6 -m venv uas_venv
  1. Enter the virtual environment:
source uas_venv/bin/activate
  1. Install the dependencies of the vision system:
pip install -r src/vision/build/vision_requirements.txt

Mac

While macOS does come preinstalled with Python 2.7, our vision code and its dependencies rely on Python 3.6. You must install Python 3.6 using this helpful guide and create a virtual environment with the following steps:

  1. Navigate to the drone_code folder and create a new virtualenv:
python3.6 -m venv uas_venv
  1. Enter the virtual environment:
source uas_venv/bin/activate
  1. Install the dependencies of the vision system:
pip install -r src/vision/build/vision_requirements.txt

Linux

Depending on your distribution, use your preferred package manager to install Python 3.6 and create a virtualenv as follows:

  1. Navigate to the drone_code folder and create a new virtualenv:
python3.6 -m venv uas_venv
  1. Enter the virtual environment:
source uas_venv/bin/activate
  1. Install the dependencies of the vision system:
pip install -r src/vision/build/vision_requirements.txt

Docker

The vision pipeline can also be run with docker. If you made any changes to the vision pipeline, you can build the docker image using

./uas.sh vision build

This builds the uas-at-ucla_vision docker image which acts as a wrapper for the vision.py file.

Vision Pipeline Overview

High-Level Diagram

Vision Pipeline High Diagram

Detailed Diagram

Vision Pipeline Diagram

Running the Vision Pipeline

The vision system uses ssh to download files between the drone and the primary ground station computer and also to sync files between the primary server and the worker clients. As a result, some configuration is required in order to run the vision system on your machine even if the entire pipeline is being run on the same machine.

Requirements for running the full pipeline:

  1. One or more known_hosts file(s) containing the public key(s) for the drone computer (your computer if you're simulating a drone) and the computer running the snipper worker. Once you have these files, make sure they are contained within the drone_code/src/vision directory and specify the (docker) path when running the vision server and clients.
  2. One or more files containing the private key(s) for the user(s) with permission to access the image files on both the vision server and the drone. These files must be contained within the drone_code/src/vision directory and specified when running the vision server and clients.

Once you have built the vision docker image by running:

./uas.sh vision build

You can run the docker image using

./uas.sh vision run <arguments>

The uas.sh script will pass any arguments after run to the docker image which passes it to the vision.py script. Although the vision pipeline is designed to be able to run on multiple connected computers, it is possible to run both the clients and the primary server on a single machine. By default, the uas.sh script will bind the local data directory to drone_code/src/vision/data_local folder.

Examples

To run the primary vision server with default settings:

./uas.sh vision run server --verbose

To run the rsync client with default settings:

./uas.sh vision run client --verbose rsync

For more information about the options you can use the --help flag after one of the positional arguments.
For example if you want to know what types of clients you can run:

./uas.sh vision run client --help

PIL(low) Introduction

One of the libraries that we use to generate synthetic images and may possibly use this year to preprocess images for the pipeline is Pillow. This library is a continuation of PIL (Python Imaging Library) whose last release was in 2009 for Python 2.6. If you are already familiar with this library feel free to skip this section. If you want to learn the basics you can follow our abridged tutorial or the offical tutorial here.

YOLO Demonstration

For the 2018 competition we used YOLO (You Only Look Once) as our localizer to identify and isolate targets from the raw images. Although YOLO is capable of both localization and classification we found that using a separate CNN was more accurate for pure classification.

Darknet Demo

To try out YOLO for yourself, you can follow the directions on this page to run Darknet on some sample images. The steps on the page are listed below for convenience:

  1. Clone the repository to some location and navigate to it:
git clone https://github.com/pjreddie/darknet
cd darknet
  1. Download the pretrained weights:
wget https://pjreddie.com/media/files/yolov3.weights
  1. Compile the model:
make
  1. Run the model:
./darknet detect cfg/yolov3.cfg yolov3.weights data/dog.jpg

You should see some output indicating what objects it recognized. The bounding boxes for these objects are saved as predictions.jpg in the same folder.
YOLO Demo Picture

Target Generator / Keras Demonstration

The shape classifier we used for the 2018 competition uses University of Oxford's VGG (Visual Geometry Group)'s VGG16 model.
Our letter classifier uses a regular CNN to similarly identify letters. To train these networks, we create synthetic targets by overlaying computer generated targets on top of background images taken from Google Maps.

In this section we will demonstrate using the target generator tool to create synthetic data and then learn how to use keras to make a simple CNN to classify targets.

Using the Target Generator Tool

To build effective machine learning models, we need a large amount of high-quality data to work with. However, it is not practical for us to fly our drone out at an airfield and take the tens of thousands of photos of unique targets we need to train our models. As a compromise, we instead generate synthetic targets using our own custom target generator script. You can find this tool in the drone_code/src/vision/targets folder.

Like the vision pipeline, more detailed information about its usage can be found by running:

python3 gen.py --help

Setup

To generate targets the target generator tool requires additional background images not included in the git repository. For the sake of time in this demonstration our background image that we will overlay our targets onto will be a black box.

As a quick review let's try using Pillow to create this background image. If you want to practice yourself try creating a solid black png image with the dimensions of at least 35px by 35px.
Solution:

>>> from PIL import Image
>>> im = Image.new(mode='RGBA', size=(30, 30), color=(0, 0, 0, 255))
>>> im.save('black50x50.png')

Move this file to a subdirectory of drone_code/src/vision/targets such as drone_code/src/vision/targets/background (default).

Generating our Data

Now we want to generate the data we will use to train our network. For this demo, we will intentionally make our data less true-to-life and easier to classify.

Inside the targets directory run:

python3 gen.py -n 3000 --shape-color white --letter-color white --target-size 20 \
    --image-size 35 35 -d training_data --transform rotate

This will generate 3000 annotated training images and save them to a folder called training_data.

Now we want to generate our test data:

python3 gen.py -n 100 --shape-color white --letter-color white --target-size 20 \
    --image-size 35 35 -d test_data --transform rotate

This is the data we will use to evaluate our trained network.

Building the CNN

Keras is a high-level API for building neural networks that abstracts away the small details. We will be using Tensorflow's implementation of the Keras API to create a simple and small CNN to classify the images we just generated.

Here is a function that builds builds a simple CNN model:

def model():
    model = tf.keras.Sequential()
    model.add(
        Conv2D(32, kernel_size=(5, 5), strides=(2, 2), activation='relu'))
    model.add(
        Conv2D(64, kernel_size=(5, 5), strides=(2, 2), activation='relu'))
    model.add(Flatten())
    model.add(Dense(1000, activation='relu'))
    model.add(Dense(13, activation='softmax'))
    return model

Training the CNN

Formatting the Data

Before we can train the model, we need to format the data we just generated into a form that the model can use.

def format_data(file_dir):
    x_vals = []
    y_vals = []
    for file in os.scandir(file_dir):
        if file.name.split('.')[1] == 'png':
            x_vals += [
                np.expand_dims(
                    cv2.imread(
                        file_dir + '/' + file.name,
                        flags=cv2.IMREAD_GRAYSCALE).astype('float32')/255,
                    axis=2)
            ]

            xmldata = ET.parse(file_dir + '/' + file.name.split('.')[0] +
                               '.xml')
            shape_name = xmldata.find('object').find('name').text
            y_val = np.zeros(13)
            y_val[TARGET_TYPES.index(shape_name)] = 1
            y_vals += [y_val]

    return (np.array(x_vals), np.array(y_vals))

Training the Model

To train the model we assign it an optimizer function (gradient descent) and a loss function (categorical cross entropy).

def train(model, xtrain, ytrain, xtest, ytest):
    model.compile(
        optimizer=tf.train.AdamOptimizer(0.01),
        loss=tf.keras.losses.categorical_crossentropy,
        metrics=[tf.keras.metrics.categorical_accuracy])

    model.fit(
        xtrain,
        ytrain,
        epochs=4,
        batch_size=64,
        validation_data=(xtest, ytest))

Testing the CNN

Here is a script to preview the results.

import cv2
import tensorflow as tf
import keras_simple
import sys
import numpy as np

if __name__ == '__main__':
    model = keras_simple.model()
    model.load_weights('./weights/simpleweights')
    im = np.expand_dims(np.expand_dims(cv2.imread(
        sys.argv[1], flags=cv2.IMREAD_GRAYSCALE).astype('float32') / 255, axis=2), axis=0)
    print(np.argmax(model.predict(im)))
    print(keras_simple.TARGET_TYPES[np.argmax(model.predict(im))])
    cv2.imshow('im', cv2.imread(
        sys.argv[1], flags=cv2.IMREAD_GRAYSCALE).astype('float32'))
    cv2.waitKey(0)

Reference

Here is the full implementation to train the model:

import tensorflow as tf
from tensorflow.keras.layers import Conv2D, Flatten, Dense
import os
import cv2
import numpy as np
import xml.etree.ElementTree as ET

TARGET_TYPES = [
    "Circle", "SemiCircle", "QuarterCircle", "Triangle", "Square", "Rectangle",
    "Trapezoid", "Pentagon", "Hexagon", "Heptagon", "Octagon", "Star", "Cross"
]


def model():
    model = tf.keras.Sequential()
    model.add(
        Conv2D(32, kernel_size=(5, 5), strides=(2, 2), activation='relu'))
    model.add(
        Conv2D(64, kernel_size=(5, 5), strides=(2, 2), activation='relu'))
    model.add(Flatten())
    model.add(Dense(1000, activation='relu'))
    model.add(Dense(13, activation='softmax'))
    return model


def train(model, xtrain, ytrain, xtest, ytest):
    model.compile(
        optimizer=tf.train.AdamOptimizer(0.01),
        loss=tf.keras.losses.categorical_crossentropy,
        metrics=[tf.keras.metrics.categorical_accuracy])

    model.fit(
        xtrain,
        ytrain,
        epochs=4,
        batch_size=64,
        validation_data=(xtest, ytest))


def format_data(file_dir):
    x_vals = []
    y_vals = []
    for file in os.scandir(file_dir):
        if file.name.split('.')[1] == 'png':
            x_vals += [
                np.expand_dims(
                    cv2.imread(
                        file_dir + '/' + file.name,
                        flags=cv2.IMREAD_GRAYSCALE).astype('float32')/255,
                    axis=2)
            ]

            xmldata = ET.parse(file_dir + '/' + file.name.split('.')[0] +
                               '.xml')
            shape_name = xmldata.find('object').find('name').text
            y_val = np.zeros(13)
            y_val[TARGET_TYPES.index(shape_name)] = 1
            y_vals += [y_val]

    return (np.array(x_vals), np.array(y_vals))


if __name__ == '__main__':
    model = model()
    (xtrain, ytrain) = format_data('../training_data')
    (xtest, ytest) = format_data('../test_data')
    train(model, xtrain, ytrain, xtest, ytest)
    model.save_weights('./keras_demo_weights')

Subteam Interest Form

Please fill out the subteam interest form.

Reminders

General Meeting is this Thursday (10/25) at 6:00 PM in Boelter 4413.

Fill out the subteam interest form before Thursday.

Create an account for the OpenProject website before Thursday.

Free Time

Resource

See this textbook website. The PDF copy is free. Skim through Chapters 1, 3, 4, and 10. These chapters are short reads. We expect you to have a high level understanding of these concepts.