User:Sudeepam: Difference between revisions

Revision as of 23:56, 21 March 2018

A: An introduction

Please describe yourself in three sentences, one of them regarding your current studies.

- My name is P Sudeepam. I like to code and make music.

- I am a second year student of Jaypee Institute of Information Technology, Noida, Uttar Pradesh, India.

- I am pursuing a majors in Electronics and Communication Engineering.

Which languages do you speak?

-I am comfortable with communicating in Hindi and English?

What's your overall background?

- I have been coding since my 6th standard and as such, I have developed a working knowledge of how programming problems should be approached.

- My areas of interest are Machine Learning, Digital Signal processing, and Algorithms.

- I learn Machine Learning through online resources such as open research papers, blogs, and MOOCs since my college does not offer a course on this subject till the final year.

- Digital Signal Processing is one of the core subjects of my major. I have been learning that, and related subjects, such as 'Basics of Signals and Systems' for an year now.

- I have been coding for many years now, knowledge of Algorithms is something that I have gained through those years of practice and repeated attempts to optimize my code.

- I will be taking a formal course on Algorithms in my next semester.

Why do you want to participate in the Google Summer of Code? What do you hope to gain by doing so?

The projects that I have been or will be doing as part of my major have a good part that they will be mentored and judged by a professor who has a deep understanding of the subject to which my project will be related. The bad part will be that for those projects, the University will restrict me to use the concepts that I have been taught by the University. Similarly, in self undertaken projects, I always have the flexibility to use whatever I want but then there is no one available to oversee those projects.

My main reason to participate in the GSoC is directly related to this. A mentor will be available to oversee my project and as long as it is optimal, (which I'll make sure it is), I will be able to approach the problems, the way 'I' would like to. The tools I'd be allowed to use (like the version control system) will probably be limited but that is understandable.

In addition to that, it will be a great opportunity to contribute to the Open Source community and most importantly, to learn how professional Software Development work is carried out.

Please also describe your previous experience with the GSoC, if any.

This is my first time applying for GSoC.

Why are you choosing Octave?

When I took up the course on 'Basics of Signals and Systems' as a part of my college coursework, about an year ago, I was required to use MATLAB for coding assignments. I never wanted to use MATLAB under a student's license because I thought that this would land me in a very uncomfortable situation when I no longer would be a student. Even if I had decided to, I would not have been able to afford it. I started looking for open source alternatives and came across GNU Octave. Since then, I have been using Octave as one of my primary programming languages, I use it, mainly for Signal processing and Machine Learning.

GNU Octave is a software that has become an integrated part of my academic life. As a GSoC student, I will be able to entirely dedicate my time in contributing to Octave and hopefully, I will end up making some noteworthy contributions that, not only are beneficial for the Octave community, but also make me feel like I did my duty to pay back to the software that has been helping me throughout my college. This is why I chose to contribute to GNU Octave.

C: Contact

Please state the (unique and identical where possible) nick you use on IRC and any other communication channel related to Octave.

-IRC Nickname: peesu

-Github: sudeepam97

-Gmail: sudeepam.pandey@gmail.com

Which time zone do you live in? Will that change over GSoC duration?

-I live in UTC+05:30 timezone.

-No, the time zone will not change over GSoC duration.

Please state the timeframe (in UTC+0) when you feel most comfortable working during GSoC. Where are your time buffers?

- Although my work schedule is flexible and I could work at any time of the day, If I were to give an exact time slot then, I'd say I'll work between 4:30pm - 9:30pm (UTC+0) and 6:30am - 11:30am (UTC+0)

E: Coding experience

Please describe your experience with C++, Octave or Matlab m-scripts, OpenGL and Qt.

- MATLAB m-scripts: I am highly experienced with MATLAB m-scripts. As I have said before, A lot of my coursework assignments involve making m-scripts. Those assignments, have even asked me to make my own implementations of inbuilt MATLAB functions and I have been using that experience to contribute to Octave-Forge signal package. In addition to this, I also implement machine learning in Octave and have gained further experience of m-scripts by doing that. If selected, for GSoC 2018, I will approach my project, with m-scripts and so I consider my experience to be a big plus point.

- C++: I am familiar with C++ but I have never made a formal project using it and I haven't been using it lately. The language will not be new to me and I can quickly revise it if required. I was taught C and C++ in my first year, in a course called "Software Development Fundamentals". At the time I decided to use C and not C++ for my project. I passed that subject with an 'A grade' is all I can currently say about my C++ experience.

- OpenGL and Qt: I have never used OpenGL before. I have never used Qt either but I've seen some Qt code of my friends.

Please describe your experience with other programming languages.

- C: I am comfortable with C and have made a number of projects using it.

- JAVA: This was my first language, (well not exactly, my first language was BASIC), I have 4 years of experience with JAVA, but I haven't used it for making projects. I use this language mainly for problem solving (competitive programming) questions.

- Python: I am familiar with the language but I still am learning to use it effectively. I am learning it so that I can use it for Machine Learning problems if required.

Please describe your experience with being in a development team. Do you have experience working with open source or free projects?

-The only experience I have with working on open source and free projects is the experience that I have got by

a) contributing to some open source repositories on Github.

b) contributing missing functions to the Octave-Forge Signal package.

I do have the experience of working on a few projects with a team but those were small projects and we never used version control systems while making them.

Please describe the biggest project you have written code for and what you learned by doing so. Also, describe your role in that project over time.

-I'll assume that by 'biggest project', you mean the project which required me to play the most important roles and write a lot of code which required a lot of testing and debugging.

In this sense, my biggest project was that of making an Arduino and LDR Sensors based maze solver bot with a team of four. The bot would traverse a maze in a 'trial run' and store all the correct paths (A path that would not lead to a dead end) in its memory whenever it encountered a 'T like' condition in the maze (A condition when more that one choice of path was available). After that, in any subsequent run, the bot would traverse the maze using the stored path (which avoided dead ends) and therefore, solve the maze in a shorter amount of time.

I was the team leader in this project and as the leader, my primary roles were to..

a) Give directions to the team in a way that every team member's individual skill-set could be taken into use.

b) Boost the morale of the team whenever it seemed like the project would not get completed in time.

These were my primary roles as the leader. My technical role was to write the code or to make a 'functioning brain' of the bot. Roles of my team mates were to design the hardware, design testing tracks etc.

Some soft skills I learned from this project were..

a) How to work with a team.

b) The roles and importance of a leader.

Some technical skills I learned were...

a) Technicalities of Arduino, LDR Sensors, motor drivers etc.

b) Technicalities of writing an Arduino/C code.

Please state the commits and patches you already contributed to Octave.

- I've contributed a patch for the db2pow function of the signals package.

- I've contributed a patch for the pow2db function of the signals package.

- I have also helped review the cconv function by suggesting a faster algorithm. My suggestion had been considered and implemented.

F: Feeling fine

Please describe (in short) your experience with the following tools

- IRC and mailing lists

-I only started using IRC after I decided to apply for GSoC and contribute to Octave. I hangout in the #Octave channel and I believe I understand how an IRC works. I have been using the mailing lists also and I am comfortable with them.

- Mercurial or other source code management systems

-I am comfortable with git and while submitting patches to Octave-Forge I have understood how Mercurial works. I understand how patches are contributed in the Octave community.

- Mediawiki or other wiki software

-I learned a little while writing this application, I've used it no more than while writing this application.

- make, gcc, gdb or other development tools

-I have used the gcc compiler, and I have used make while building Octave 4.2.1 from source. I have not used gdb before.

What will make you actively stay in our community after this GSoC is over?

-There are many things that I would like to complete in/add to Octave. Those things, and the fact that Octave is such an important piece of software for me will make me stay in the community even after GSoC.

O: Only out of interest

Did you ever hear about Octave before?

-Yes I have heard about Octave before. I heard about Octave about an year ago when my college coursework required me to code assignments in MATLAB but I decided to search for an open source alternative.

What was the first question concerning Octave you could not find an answer to rather quickly?

-I got most of my questions answered online or on the IRC channel itself. This may be a negative, but I keep asking till I have an answer so I haven't really encountered a situation when my question was not answered.

P: Prerequisites

Please state the operating system you work with.

-I mostly work on Linux Mint. I have used Windows before and can work on that if required.

Please estimate an average time per day you will be able to access

- an internet connection

-Any time of the day (i.e. 24 hours/day)

- a computer

-Any time of the day (i.e. 24 hours/day)

- a computer with your progressing work on

-Any time of the day (i.e. 24 hours/day)

Please describe the degree to which you can install new software on computers you have access to.

-I have root access on my PC so I can install every software to the fullest on it. I'll be using my personal computer for most of the work.

-If required, I can access the computers of my University which use Windows 10 but I won't have administrative privileges on those Desktops.

S: Self-assessment

Please describe how useful criticism looks from your point of view as committing student.

- I always appreciate constructive criticism.

How autonomous are you when developing:
- Do you like to discuss changes intensively and not start coding until you know what you want to do?
- Do you like to code a proof of concept to 'see how it turns out', modifying that and taking the risk of having work thrown away if it doesn't match what the project or original proponent had in mind?

- That depends on the task. I'd say that, if the outcome is defined or at least predictable with some decent accuracy, I'll discuss the problem statement and immediately get down to code. Otherwise, I'll code up a small model first to see if the approach really would work and proceed thereafter.

Y: Your task

Did you select a task from our list of proposals and ideas? If yes, what task did you choose? Please describe what part of it you especially want to focus on if you can already provide this information.

Yes, I have decided to work on the command line suggestion feature [1]. This feature is essentially a complex, decision making problem and therefore, I will approach it with Neural Networks, made using Octave (m-scripts) itself.

My special focus would be to have a minimal trade-off between the accuracy and speed of the feature. Please look at the last and additional section of 'Project Description' for technical details. I would like to apologize for creating this extra part but it describes some of the important technicalities of this project and I believed that it should have been present.

Please provide a rough estimated timeline for your work on the task.

Preparations for the project (pre-community bonding)

While this application is being reviewed, I have started working on a m-script which will be used to catch the most common spelling errors that the users make. This list of errors could then be...

-Uploaded to a secure server directly.

-Stored as a text file and we can ask the users to share this file with us.

Community Bonding period

I will use the community bonding period to...

-Persuade the community to use our data extraction script and help us collect training data. This can be done by discussing the benefits of a command line suggestion feature and sharing my current implementation of this feature [2].

-Ask the community to report issues with the m-script containing the current implementation. I’ll shift the current implementation to mercurial if required.

-Discuss how we should receive the data generated by the users, work on the approach, and start the collection of data.

-Organize the data as it is received and divide it to create proper, training, cross-validation, and test sets for the Neural Network.

May, 14 – June, 10 (4 weeks)

Week 1 (May, 14 – May, 21): I would not be able to do a lot of work in this week as I have my final examinations at this time. I’ll take this week as an extension of the community bonding period and use it to collect issues, collect more data and divide it into proper datasets.

Week 2 and Week 3 (May, 21 – June, 3): Most of the code of the Neural Network would be identical to my current implementation and so I’ll start by making my current implementation bug free (Some known issues can be found here: [3]) and by coding it according to the Octave coding standards. I plan to keep the user data coming for these weeks also and so I’ll leave room for network parameters such as the number of hidden layers and the number of neurons per hidden layer because these are data dependent parameters. If all this work gets completed before the expected time, I’ll automatically move on to complete next week’s work.

Week 4 (June, 4 – June, 10): By now we will have sufficient data, data from octave-online.net and from approximately 6 weeks of extraction script’s usage. I’ll quickly give a final look to the data and start training the Neural Network with it. I will choose appropriate values of the data dependent network parameters which, while keeping the speed of the Neural Network fast, would fit the learning parameters (weights) of the Neural Network to our data with a high level of accuracy. I would then measure the accuracy of the Network on cross validation and test sets and see how our network generalizes to unknown typographic errors. I will also write some additional tests for various m-scripts used.

Phase 1 evaluations goal: A set of working neural network m-scripts, which could suggest corrections for typographic errors.

June, 11 – July, 8 (4 weeks)

Week 5 (June, 11 – June, 17): I’d like to take this week to work in close connection with the community and perform tests on the newly created m-scripts. Essentially, I’ll be asking the community to try out our m-scripts and see how they work for them. I will work on the issues pointed out by the community and by the mentors as they are reported and would try to make the m-scripts perfect in this week itself.

Week 6 (June, 17 – June, 24): I’ll fix any remaining issues and proceed to discuss and understand how our Neural Network should be integrated with Octave. I’ll start working on integrating the network as soon as the approach is decided. It is worth mentioning here that we will merge a trained network with Octave and therefore the chances of our code being slow are eliminated.

Week 7 – Week 8 (June, 25 – June, 8): I will integrate our neural network with Octave as discussed, and write, and perform tests to make sure that everything works the way it should. If this task gets completed earlier than expected, I’ll automatically move on to the next task.

Phase 2 evaluations goal: A development version of Octave which has a command line suggestion feature (currently there will be no mechanism available to easily select the corrections suggested and easily enable/disable this feature).

July, 9 – August, 5 (4 weeks):

Week 9 (July, 9 – July, 15): The development version of Octave, with an inbuilt suggestion feature will be open for error reports. I’ll work on the issues as they are reported and also discuss what an easy enable/disable mechanism and the mechanism to easily select the corrections suggested should be like.

Week 10 (July, 16 – July, 22): I’ll create the required mechanisms as discussed, write and perform tests, and push a development version with a complete command line suggestion feature.

Week, 11 – Week, 12 (July, 23 – August, 5): I’ll work in close connection with the community, fix the issues that are reported, and ask for further suggestions on how the command line suggestion feature could be made better.

Phase 3 evaluations goal: A development version of Octave with a complete and working command line suggestion feature, open to feedback and criticisms

Project Description

Let me first describe the three kinds of Neural Networks that we can end up making (Depending on the training data available).

A network trained with only the correct spellings of the inbuilt functions

This type of network would be very easy to make because only a list of all the existing functions of GNU Octave and no additional data will be required. With this approach, we would end up creating a Neural Network which would easily understand typographic errors caused due to letter substitutions and transportation of adjacent letters. In-fact, this network would understand multiple letter substitutions and transportations also and not only single letter substitutions or transportations. I say this with such confidence because I have already made a working neural network of this type [4]. This network would however, perform poorly if an error is caused due to accidental inclusion or accidental deletion of letters.

A network trained with the correct spellings of the functions and self created errors

This would be slightly harder to make but should give us an improved performance. I will create some misspellings of all the functions, by additional inclusion, deletion, substitution, and transportation of one or two letters and then add all these self created misspellings to the dataset which will be used to train the network. Such a network would understand what correct spellings and random typographic errors look like. It will easily understand substitutions and transportations like the previous network but would also be more accurate while predicting errors caused due to additions/deletions. However, it is worth mentioning here that we may create errors while creating errors. Because our training data will be modified randomly, although the chances are rare, the Neural Network may show uncertain behaviour.

A network trained with the correct spellings of the functions and the most common typographic errors

To make this kind of Neural Network, we need to know what common typographic errors look like. With that goal in mind, I have already contacted the people behind octave-online.net [5] who say that they are happy to support the development of GNU Octave and have shared a list of top 1000 misspellings with me through email. However the users of octave-online.net are only one of the parts of the entire user group. For best results, we would require the involvement of the entire Octave community, which, also implies that it will be the hardest and the most fun Neural Network to make. By creating a script that would be able to catch typographic errors and by asking the users of GNU Octave to use this script and share the most common spelling errors with us, and training the network on the dataset thus created, we’ll create a Neural Network which would understand what correct spellings and the most common typographic errors look like. Such a network would give good results, almost every-time and with all kinds of errors. This is because when our network knows what common errors are like, most of the times it would know the answer beforehand. For the remaining times, the network would be able to predict the correct answer.

I understand that using Neural Networks may seem like an overkill and that one could think about using traditional data structures like tries, or algorithms like 'edit distance' which are made for exactly these kinds of problems. However, I have chosen neural networks because, after due consideration, as described below, to me, neural networks look like the best solution to minimize the trade-off between speed and accuracy of the feature. Edit distance, while being accurate, would be the slowest approach of the three, and tries, though fast, would not be able to generalize to unknown typographic errors. Neural networks, however, when trained with proper data, would be highly accurate, would generalize to unknown typographic errors, and because of the fact that ultimately a 'trained' Neural Network will be merged with core Octave, this approach will be fast as well. Another disadvantage when using tries that I'd like to mention is that, if, say, we are unable to arrange a sufficiently large list of common spelling errors, a trie would fail miserably, however, a neural network even in that case, would easily identify letter substitutions and transportations of adjacent letters.

At a later stage (possibly after GSoC), I could merge the data extraction script with Octave so that the performance of the Network could be improved with time. This could come with an easy disable feature, so that only the users who would like to share their spelling errors would do so.

@@ Line 209: / Line 209: @@
 == Project Description ==
+Let me first describe the three kinds of Neural Networks that we can end up making (Depending on the training data available).
+:'''A network trained with only the correct spellings of the inbuilt functions'''
+This type of network would be very easy to make because only a list of all the existing functions of GNU Octave and no additional data will be required. With this approach, we would end up creating a Neural Network which would easily understand typographic errors caused due to letter substitutions and transportation of adjacent letters. In-fact, this network would understand multiple letter substitutions and transportations also and not only single letter substitutions or transportations. I say this with such confidence because I have already made a working neural network of this type [https://github.com/Sudeepam97/Did_You_Mean]. This network would however, perform poorly if an error is caused due to accidental inclusion or accidental deletion of letters.
+:'''A network trained with the correct spellings of the functions and self created errors'''
+This would be slightly harder to make but should give us an improved performance. I will create some misspellings of all the functions, by additional inclusion, deletion, substitution, and transportation of one or two letters and then add all these self created misspellings to the dataset which will be used to train the network. Such a network would understand what correct spellings and random typographic errors look like. It will easily understand substitutions and transportations like the previous network but would also be more accurate while predicting errors caused due to additions/deletions. However, it is worth mentioning here that we may create errors while creating errors. Because our training data will be modified randomly, although the chances are rare, the Neural Network may show uncertain behaviour.
+:'''A network trained with the correct spellings of the functions and the most common typographic errors'''
+To make this kind of Neural Network, we need to know what common typographic errors look like. With that goal in mind, I have already contacted the people behind octave-online.net [https://octave-online.net/] who say that they are happy to support the development of GNU Octave and have shared a list of top 1000 misspellings with me through email. However the users of octave-online.net are only one of the parts of the entire user group. For best results, we would require the involvement of the entire Octave community, which, also implies that it will be the hardest and the most fun Neural Network to make.
+By creating a script that would be able to catch typographic errors and by asking the users of GNU Octave to use this script and share the most common spelling errors with us, and training the network on the dataset thus created, we’ll create a Neural Network which would understand what correct spellings and the most common typographic errors look like. Such a network would give good results, almost every-time and with all kinds of errors. This is because when our network knows what common errors are like, most of the times it would know the answer beforehand. For the remaining times, the network would be able to predict the correct answer.
+I understand that using Neural Networks may seem like an overkill and that one could think about using traditional data structures like tries, or algorithms like 'edit distance' which are made for exactly these kinds of problems. However, I have chosen neural networks because, after due consideration, as described below, to me, neural networks look like the best solution to minimize the trade-off between speed and accuracy of the feature.
+Edit distance, while being accurate, would be the slowest approach of the three, and tries, though fast, would not be able to generalize to unknown typographic errors. Neural networks, however, when trained with proper data, would be highly accurate, would generalize to unknown typographic errors, and because of the fact that ultimately '''a 'trained' Neural Network''' will be merged with core Octave, this approach will be fast as well. Another disadvantage when using tries that I'd like to mention is that, if, say, we are unable to arrange a sufficiently large list of common spelling errors, a trie would fail miserably, however, a neural network even in that case, would easily identify letter substitutions and transportations of adjacent letters.
+At a later stage (possibly after GSoC), I could merge the data extraction script with Octave so that the performance of the Network could be improved with time. This could come with an easy disable feature, so that only the users who would like to share their spelling errors would do so.

User:Sudeepam: Difference between revisions

Revision as of 23:56, 21 March 2018

Contents

A: An introduction

C: Contact

E: Coding experience

F: Feeling fine

O: Only out of interest

P: Prerequisites

S: Self-assessment

Y: Your task

Project Description

Navigation menu

User:Sudeepam: Difference between revisions

Revision as of 23:56, 21 March 2018

A: An introduction

C: Contact

E: Coding experience

F: Feeling fine

O: Only out of interest

P: Prerequisites

S: Self-assessment

Y: Your task

Project Description

Navigation menu

Search