User:Yayan

From Octave
Jump to navigation Jump to search

Public application template

A: An introduction

  • Please describe yourself in three sentences, one of them regarding your current studies.

My name is Brayan Stiven Zapata Impatá and I am a native Spanish speaker, but I am also proficient in English. I am currently a PhD student in Robotics and Machine Learning at University of Alicante, Spain. My background includes a Bachelor's degree in Computer Engineering, specialized in computer vision and artificial intelligence, and a Master's degree in Computer Engineering.

  • Why do you want to participate in the Google Summer of Code? What do you hope to gain by doing so?

My desire in taking part of the GSoC is driven mainly by a need of developing further my programming skills as well as working on a real open project. I have worked on many private researches and, although research is carried out with the idea of helping grow our society, it is usually very jealous of sharing. This is my first time participating at GSoC but I hope to learn a lot from it!

  • Why are you choosing Octave?

On my very first day at college when I began the journey to get my Bachelor's degree, what I was firstly introduce to was Octave. Some time has passed but I still find the Octave project a quite useful resource, I did even use it for my Bachelor's thesis. Since I feel in debt with this tool and its community I want to give it some of my time and knowledge in return. Furthermore, I would like to make this happen because by doing so the Octave project would gain in attractiveness in this hot topic so many others would join this community.

C: Contact

  • Please state the (unique and identical where possible) nick you use on IRC and any other communication channel related to Octave.

My IRC nickname is yayan and my e-mail address is brayan.inf@gmail.com. My GitHub profile can be found at https://github.com/yayaneath.

  • Which time zone do you live in? Will that change over GSoC duration?

My time zone is UTC+1 and it will not change during GSoC.

  • Please state the timeframe (in UTC+0) when you feel most comfortable working during GSoC. Where are your time buffers?

Being sincere, since I am currently on a PhD program my work load is very unstable. However, I plan to work on GSoC from 07:00 to 10:00 and 17:00 to 21:00 during the weekdays. Anyway, I am very flexible and surely the excitement of working on this will lead to many more hours even on weekends. I worked in a private company while at the same time I carried out my Master studies through a year and a half so I am quite good at managing my time.

E: Coding experience

This part is one of the more important ones in your application. You are allowed to be as verbose as you want, as long as you stay on topic ;-)

  • Please describe your experience with C++, Octave or Matlab m-scripts, OpenGL and Qt.

I have been coding in C/C++ since I was in high school and I keep writing C++ code every day. As an example, during the last term of my Bachelor's degree I developed a Machine Learning toolkit paring with some of my classmates. I was in charge of implementing: a Support Vector Machine class with three kernels, a Genetic Algorithm with some self-created mutation operators, the k-Means clustering algorithm and a cross-validation method. Then, I used this toolkit to create a broker bot that could predict movements in the stock market.

Regarding Octave/Matlab, as I said before I have been using m-scripts for 6 years now and my Bachelor's thesis was written solely in this language. This principally consisted on writing preprocessing functions for treating medical signals and then implementing the Particle Swarm Optimization algorithm to optimise the training set used for a Machine Learning model. I have also used it to execute simple filters over images. Nowadays I keep using it as a prototyping environment. Finally, I once took a 4-hours-course at college to learn about OpenGL but I don't have further experience with it nor Qt.

  • Please describe your experience with other programming languages.

I have experience working with Python 2.7.6. For my Master's thesis I created an open tool for open research data reutilisation and the scripting was whole written in Python. These scripts mainly used libraries like matplotlib, numpy and pandas. In addition to that, Tensorflow was utilised through its Python API so I could implement a simple logistic regression model. As a side project, just for the sake of learning, I once used Python and Keras (over Tensorflow) for building a cat/dog images classifier training a Convolutional Neural Network.

I do also have experience with Java, although I don't use it frequently. The same could be said about Scala, Go, R and FORTRAN that I have used sporadically.

  • Please describe your experience with being in a development team.

In the Open Source community I am new. I have experience working on a company (one year and a half) where we tried to apply the SCRUM agile methodology so I got some fluency in the use of GIT and building/packaging processes. I feel very comfortable working with others.

  • Please describe the biggest project you have written code for and what you learned by doing so. Also describe your role in that project over time.

I think the biggest project I have worked on was that Machine Learning toolkit I built with four other colleagues in the last term of my Bachelor's degree. As I wrote before, I was in charge of implementing and testing several algorithms like SVM (linear, Gaussian and Sigmoid kernels), k-Means, GA and cross-validation. I learnt a lot about Machine Learning, software development practices and I did also have a first contact with great libraries like boost.

  • Please state the commits and patches you already contributed to Octave.

Up to this moment I have been only helping others in the IRC channel with general Octave questions. I plan to start contributing very soon but I am still deciding whether to contribute to the Image package with missing Matlab functions or try to revive the nnet unmaintained package.

F: Feeling fine

  • Please describe (in short) your experience with the following tools:
    • IRC and mailing lists

This is my first time using IRC and mailing lists.

    • Mercurial or other source code management systems

I have experience in the use of Git due to college and my past working experience.

    • Mediawiki or other wiki software

This is my first time using this kind of wiki software as well.

    • make, gcc, gdb or other development tools

I use CMake and gcc everyday. I am not very familiar with gdb (apart from some college assignments) but I have experience with debugging through the Eclipse IDE.

  • What will make you actively stay in our community after this GSoC is over?

I believe that I will stay in this community because I really want to contribute to grow such a useful resource and up to this moment I have been enjoying my time here. I think I can complement it with more knowledge about Machine Learning and general Artificial Intelligence techniques that could be integrated in a new package. This GSoC project could be the beginning.

O: Only out of interest

  • Did you ever hear about Octave before?

As said before, Octave was the first tool I was introduce to on my first day at college. It did even support the coding on my Bachelor's thesis.

  • What was the first question concerning Octave you could not find an answer to rather quickly?

I had a bad time implementing some signal processing functions but I think that was due to some lack of knowledge about the topic.

P: Prerequisites

  • Please state the operating system you work with.

I frequently use Ubuntu 16.04 but I do also dispose of an Ubuntu 14.04, Windows 10 and a Mac OS X. I can access all of them at any time.

  • Please estimate an average time per day you will be able to access

I have access to an internet connection the whole day, not only on a computer but also on my smartphone. I expend about 10 hours per day in front of my computers and they are all synchronised so I can access my progressing work any time.

  • Please describe the degree to which you can install new software on computers you have access to.

I can install anything on them.

S: Self-assessment

  • Please describe how useful criticism looks from your point of view as committing student.

Constructive criticism is a key factor for improving one's skills and for creating better things. Someone told me once that if no one says nothing about your work that can only mean two things: it was perfectly done and no one knows how to improve it, or it is so regular that it does not seem useful or interesting to others - and the most common case is the second one. In addition, if you are not self-critic with your own work, that is a signal of being conformist. I do not like being conformist because that stops you from growing and learning.

  • How autonomous are you when developing:

I am very used to work autonomously: in my past job I was in charge of a whole project myself and as a researcher I am very proactive. I like to think and design everything as perfectly as I can before writing code because I find it a good practice for saving time in the long term. However, I can understand that sometimes it is quite useful writing some toy prototypes that can evaluate your ideas. This is frequent in research when you do not know how something will turn out and you test it using tools like Octave that support perfectly a fast prototyping process. If the results are not satisfying they are never a failure if you learn from them!

Y: Your task

  • Did you select a task from our list of proposals and ideas?

Yes, I would like to work on the Neural Networks package: Convolutional Neural Networks [1]

  • Please provide a rough estimated timeline for your work on the task. This should include the GSoC midterms and personal commitments like exams or vacation ("non-coding time"). If possible, include two or three milestones you expect.

I plan to work on a daily habit on this project expending at least 35 hours a week. I may take part of a summer school on Machine Learning here in Spain in June, but that would only be five days and I can bring with me my laptop. This is the schedule I would like to follow:

Before 30 May:

I want to be more familiarised with the community and the mentors. I would like to contribute as much as I can to the nnet forge package. I plan to check if it is possible to reuse code from such package, since it is related to neural networks. In addition, we should study how the Matlab CNN toolbox works in depth so we can have a great modulated and configurable design for our package so it can be compatible before coding. It is critic finding if Pytave offers enough functionality to use Tensorflow (or even Keras) from the Python interface.

Phase 1, until 30 June:

Write the basic layers (image input layer, convolutional layer, RELU layer, max/avg pooling layer, fully connected layer and dropout layer), test their functionality and check their compatibility with Matlab code.

Phase 2, until 28 July:

Write the rest missing layers like classification, softmax and regression layers. Write the main CNN class, that holds the layers, its training process and the classify method. In this point we should be able to train some networks and test their functionality. Once again, I would check its compatibility with the Matlab toolbox.

Phase 3, until 29 August:

Add extra features like the activations class, that can display a layer activations, and the possibility to load pretrainned networks. We could start by defining the AlexNET and VGG16 networks but this could be extended with an exporting/importing tool to unload/load CNNs' architecture and weights in files. Add documented use cases to create a tutorial.

Testing:

As for testing the layers, I plan to write black-box unity tests previously to their implementation. That is writing small pieces of code that check that given an input to the layer, an expected output is returned by it. This would be useful to check two things: the implementation is correct (we are calling the Tensorflow API properly) and it is also compatible with the Matlab's CNN toolbox because we could make them have the same behaviour.

We are not implementing the core mathematical functions but just interfaces to Tensorflow, which has these functions already implemented. What I would test here is our call to the Tensorflow API. For example, we could test that when creating a convolutional layer with a given input and kernel shape, the output shape is as expected.

Once we have every module implemented and tested, to test the whole unit during Phase 2 I plan to perform gradient checking and also plot the network performance (scored with metrics like accuracy, cost, F1-score...) over its parameters like epoch or training set sizes. This could be done using simple datasets like MNIST.

Since we will be running Tensorflow code under the surface, we could go a step forward and use the Tensorboard utility to check some of these metrics or even ensure with the graph board that the displayed network corresponds to the defined architecture in the m-code. This utility could be enabled for the Octave package.

Documenting:

Through the whole process I plan to keep the code well documented using Doxygen. In case we manage to use the Tensorflow's Python interface, we could look for another way of doing it (like Sphinx).