Sudeepam

Joined 11 March 2018
32 bytes added ,  22 March 2018
Line 219: Line 219:


:'''3) A network trained with the correct spellings of the functions and the most common typographic errors'''
:'''3) A network trained with the correct spellings of the functions and the most common typographic errors'''
To make this kind of Neural Network, we need to know what common typographic errors look like. With that goal in mind, I have already contacted the people behind octave-online.net [https://octave-online.net/] who say that they are happy to support the development of GNU Octave and have shared a list of top 1000 misspellings with me through email. However the users of octave-online.net are only, one of the parts of the entire user group. '''For best results''', we would require the involvement of the entire Octave community, which, also implies that it will be the hardest and the most fun Neural Network to make.  
To make this kind of Neural Network, we need to know what common typographic errors look like. With that goal in mind, I have already contacted the people behind octave-online.net [https://octave-online.net/] who say that they are happy to support the development of GNU Octave and have shared a list of top 1000 misspellings with me through email. However the users of octave-online.net are only, one of the parts of the entire user group. '''For best results''', we would require the involvement of the entire Octave community, which, also implies that it will be the hardest and the most fun Neural Network to make.
 
By creating a script that would be able to catch typographic errors and by asking the users of GNU Octave to use this script and share the most common spelling errors with us, and training the network on the data-set thus created, we’ll create a Neural Network which would understand what '''correct spellings and the most common typographic errors''' look like. Such a network would give good results, almost every-time and with all kinds of errors. This is because when our network knows what common errors are like, most of the times it would '''know the answer''' beforehand. For the remaining times, the network would be able to '''predict the correct answer'''.
By creating a script that would be able to catch typographic errors and by asking the users of GNU Octave to use this script and share the most common spelling errors with us, and training the network on the data-set thus created, we’ll create a Neural Network which would understand what '''correct spellings and the most common typographic errors''' look like. Such a network would give good results, almost every-time and with all kinds of errors. This is because when our network knows what common errors are like, most of the times it would '''know the answer''' beforehand. For the remaining times, the network would be able to '''predict the correct answer'''.


I understand that using Neural Networks may seem like an overkill and that one could think about using traditional data structures like tries, or algorithms like 'edit distance' which are made for exactly these kinds of problems. However, I have chosen neural networks because, after due consideration, as described below, to me, neural networks look like the best solution to minimize the trade-off between speed and accuracy of the feature.
At a later stage (possibly after GSoC), I could merge the data extraction script with Octave so that the performance of the Network could be improved with time. This could come with an easy disable feature, so that only the users who would like to share their spelling errors would do so.
Edit distance, while being accurate, would be the slowest approach of the three, and tries, though fast, would not be able to generalize to unknown typographic errors. Neural networks, however, when trained with proper data, would be highly accurate, would generalize to unknown typographic errors, and because of the fact that ultimately '''a 'trained' Neural Network''' will be merged with core Octave, this approach will be fast as well. Another disadvantage when using tries that I'd like to mention is that, if, say, we are unable to arrange a sufficiently large list of common spelling errors, a trie would fail miserably, however, a neural network even in that case, would easily identify letter substitutions and transportations of adjacent letters.


At a later stage (possibly after GSoC), I could merge the data extraction script with Octave so that the performance of the Network could be improved with time. This could come with an easy disable feature, so that only the users who would like to share their spelling errors would do so.
I understand that using Neural Networks may seem like an overkill and that one could think about using traditional data structures like trie, or algorithms like 'edit distance' which are made for exactly these kinds of problems.
 
However, edit distance, while being accurate, would be the slowest approach of the three, and trie, though fast, would not be able to generalize to unknown typographic errors. Neural networks, however, when trained with proper data, would be highly accurate, would generalize to unknown typographic errors, and because of the fact that ultimately '''a 'trained' Neural Network''' will be merged with Octave, this approach will be fast as well. Another disadvantage when using trie that I'd like to mention here is that, if, say, we are unable to arrange a sufficiently large list of common spelling errors, a trie would fail miserably, however, a neural network even in that case, would easily identify letter substitutions and transportations of adjacent letters.
 
This is why, after due consideration, as described above, to me, '''neural networks look like the best solution to minimize the trade-off between speed and accuracy of the feature''' and this is the reason why I have chosen to use them.
98

edits