User:Dspies: Difference between revisions

From Octave
Jump to navigation Jump to search
 
(3 intermediate revisions by the same user not shown)
Line 30: Line 30:
'''C++'''<br />
'''C++'''<br />
I've programmed somewhat on and off in C++, but I've never worked on a project with a group or intended for public consumption.  A couple years ago I wrote a Quadratic Number Field Sieve using gmp and that was the first significant piece of code I wrote in C++.  A couple months ago I wrote a preprocessor for grounded Answer Set Programs that does something analogous to forward-checking to infer binary constraints. During this project I read "Effective C++" and some of "C++ Templates: The Complete Guide". I did extensive online browsing to understand C++-11 rvalue references and perfect forwarding (although I realize Octave doesn't use C++-11 so those won't be available). I also posted a lot of questions on Stack Overflow.<br />
I've programmed somewhat on and off in C++, but I've never worked on a project with a group or intended for public consumption.  A couple years ago I wrote a Quadratic Number Field Sieve using gmp and that was the first significant piece of code I wrote in C++.  A couple months ago I wrote a preprocessor for grounded Answer Set Programs that does something analogous to forward-checking to infer binary constraints. During this project I read "Effective C++" and some of "C++ Templates: The Complete Guide". I did extensive online browsing to understand C++-11 rvalue references and perfect forwarding (although I realize Octave doesn't use C++-11 so those won't be available). I also posted a lot of questions on Stack Overflow.<br />
Here's a recent C++ project of mine:
https://sourceforge.net/projects/aspmutexpreprocessor/files/?source=navbar <br />
'''Octave m-scripts'''<br />
'''Octave m-scripts'''<br />
I use Octave a lot for minor tasks that come up in research.  It's a wonderful tool and I love coming up with sneaky ways to vectorize bits of code.  Sometimes I go on StackOverflow to look for "how do you vectorize x" questions just so I can answer them.  Most of them are for Matlab though.  I don't know why Octave isn't a more popular alternative.<br />
I use Octave a lot for minor tasks that come up in research.  It's a wonderful tool and I love coming up with sneaky ways to vectorize bits of code.  Sometimes I go on StackOverflow to look for "how do you vectorize x" questions just so I can answer them.  Most of them are for Matlab though.  I don't know why Octave isn't a more popular alternative.<br />
Line 122: Line 124:
[[Category: Summer of Code]]
[[Category: Summer of Code]]


* Codebase Consolidation
== Codebase Consolidation ==
This is a list of things which I think need to be consolidated and cleaned up in the Octave codebase.  Before embarking on my project, I'd feel obliged to handle each of these.
This is a list of things which I think need to be consolidated and cleaned up in the Octave codebase.  Before embarking on my project, I'd feel obliged to handle each of these.
** Permutation Matrices are not index vectors.  They shouldn't subclass Array<octave_idx_type>.  They should instead contain a private instance of Array<octave_idx_type>.  Right now element_type and xelem are both wrong, and elem, checkelem, operator() all return octave_idx_type when they should return bool (for logical correctness) or double (for Matlab compatibility)
* Permutation Matrices are not index vectors.  They shouldn't subclass Array<octave_idx_type>.  They should instead contain a private instance of Array<octave_idx_type>.  Right now element_type and xelem are both wrong, and elem, checkelem, operator() all return octave_idx_type when they should return bool (for logical correctness) or double (for Matlab compatibility)
** Diagonal matrices, permutation matrices and compressed column sparse matrices are all different types of sparse matrices.  There needs to be an AbstractSparse class which is the base of DiagMatrix, PermMatrix and Sparse.  In particular, all three types should have an nz_iter which iterates over its non-zero elements in column-major order.
* Diagonal matrices, permutation matrices and compressed column sparse matrices are all different types of sparse matrices.  There needs to be an AbstractSparse class which is the base of DiagMatrix, PermMatrix and Sparse.  In particular, all three types should have an nz_iter which iterates over its non-zero elements in column-major order.
** Permuation Matrices should always be column-major.  Every other matrix type in the entire code-base is.  This bool _colp just adds unnecessary complication without granting any measurable performance boost except in extremely contrived situations.
* Permuation Matrices should always be column-major.  Every other matrix type in the entire code-base is.  This bool _colp just adds unnecessary complication without granting any measurable performance boost except in extremely contrived situations.
** The cumulative forms of max, min, all, any, sum, and product (and there should also probably be some sort of cum_xor) should all call the same function internally.  They all are forms of reduce with a different binary operation.  Right now, the implementations miss a lot of cases.  Ex try:<br />
* <p> The cumulative forms of max, min, all, any, sum, and product (and there should also probably be some sort of cum_xor) should all call the same function internally.  They all are forms of reduce with a different binary operation.  Right now, the implementations miss a lot of cases.  Ex try:<br />
max(sprand(1000000000, 20, 0.000001), [], 2);<br />
max(sprand(1000000000, 20, 0.000001), [], 2);<br />
This fails with an OOM because it wasn't considered when writing max <br />
This fails with an OOM because it wasn't considered when writing max <br />
This sort of aggregate function (over the rows) can be done in O(min(h+nnz, nnz*log(w))) time, but one would have to implement the algorithm for each of the above-mentioned 7 functions.  By consolidating these into one templated function, it'll make it worth the effort to implement.
This sort of aggregate function (over the rows) can be done in O(min(h+nnz, nnz*log(w))) time, but one would have to implement the algorithm for each of the above-mentioned 7 functions.  By consolidating these into one templated function, it'll make it worth the effort to implement.</p>
** The changes I made in https://savannah.gnu.org/patch/?8417 need to be added and need to be modified to accomodate diagonal matrices as well.  The fact that the original implementer of find neglected to handle diagonal matrices should be evidence enough that the ad-hoc approach doesn't work.
* There has to be a standard way to take an octave_value argument and call the proper template instantiation of a function based on its type. I started doing this with https://savannah.gnu.org/patch/?8417 , but diagonoal matrices still need to be added.  The fact that the original implementer of find neglected to handle diagonal matrices should be evidence enough that the ad-hoc approach doesn't work.

Latest revision as of 19:54, 29 March 2014

Public application template[edit]

This part is answered in public on your user page. Please copy its source (edit) and then fill. Delete any examples and annotations afterwards. Same for questions that do not apply to your situation.

A: An introduction[edit]

  • Please describe yourself in three sentences, one of them regarding your current studies.

I am a Masters student in Computing Science at the University of Alberta. I'm working on logic and reasoning and game tree search algorithms. I enjoy solving math and programming problems.

    • Which languages do you speak?

English, and a little Spanish

    • What's your overall background?

Computing Science

  • Why do you want to participate in the Google Summer of Code? What do you hope to gain by doing so?

I've used Octave as a tool quite a bit in my research and I enjoy writing programs in it. But for dealing with graph operations, it's often not robust or complete and I find myself constantly inventing workarounds to deal with the bugs I encounter. I'd like to help fix those gaps, both to benefit my own research and to gain experience working on an open-source project (I've never seriously worked on a major project intended for public consumption). GSoC seems to be providing a way to receive funding doing that.

    • Please also describe your previous experience with the GSoC, if any.

I'd never heard of it until last Saturday

  • Why are you choosing Octave?

It's a wonderful expressive language that I'm familliar with and enjoy writing programs in. I think for many computational tasks, there's no easy and efficient way to solve them without working in a language that provides extensive support for matrix operations (eg Octave/Matlab). Whereas Matlab is proprietary and seems to assume its users are only interested in minor scripting tasks, Octave is open-source and further provides sufficient infrastructure to write full large-scale projects.

C: Contact[edit]

  • Please state the (unique and identical where possible) nick you use on IRC and any other communication channel related to Octave.

dspies

  • Which time zone do you live in? Will that change over GSoC duration?

MST UTC-7
DST is from March 9 to Nov 2

  • Please state the timeframe (in UTC+0) when you feel most comfortable working during GSoC. Where are your time buffers?

I don't really know. I've always had flexible hours before and I've always taken advantage of that by starting work sometimes at 9 AM and sometimes at noon depending on how I feel. I usually stay up late and don't do anything before 9 unless I have to, but I can if it's requested.
Sorry about dodging the question. I'll come back and answer honestly if it's really important.

E: Coding experience[edit]

This part is one of the more important ones in your application. You are allowed to be as verbose as you want, as long as you stay on topic ;-)

  • Please describe your experience with C++, Octave or Matlab m-scripts, OpenGL and Qt.

C++
I've programmed somewhat on and off in C++, but I've never worked on a project with a group or intended for public consumption. A couple years ago I wrote a Quadratic Number Field Sieve using gmp and that was the first significant piece of code I wrote in C++. A couple months ago I wrote a preprocessor for grounded Answer Set Programs that does something analogous to forward-checking to infer binary constraints. During this project I read "Effective C++" and some of "C++ Templates: The Complete Guide". I did extensive online browsing to understand C++-11 rvalue references and perfect forwarding (although I realize Octave doesn't use C++-11 so those won't be available). I also posted a lot of questions on Stack Overflow.
Here's a recent C++ project of mine: https://sourceforge.net/projects/aspmutexpreprocessor/files/?source=navbar
Octave m-scripts
I use Octave a lot for minor tasks that come up in research. It's a wonderful tool and I love coming up with sneaky ways to vectorize bits of code. Sometimes I go on StackOverflow to look for "how do you vectorize x" questions just so I can answer them. Most of them are for Matlab though. I don't know why Octave isn't a more popular alternative.
Additionally, for my research I recently wrote some more substantial Octave programs including one to build the planning graph for a STRIPS planning problem and identify mutual exclusion constraints among fluents and another to take the mutual exclusion constraint graph and cover it with a minimal set of multicliques (multi-partite complete graphs). Both these tasks are very well-suited to working with sparse matrices.
I've heard that Octave and Matlab have extensive plotting/graphing libraries and I know that many people see that as the primary feature, but I've done almost no substantial work with plotting and graphing (my PGM project was a group project and one of the other students handled that part). OpenGL and Qt
I know that OpenGL and Qt are names of two libraries which a lot of things depend on. I think they have something to do with GUI-building or graphics or something. Sorry.

  • Please describe your experience with other programming languages.

Although it's young, I think D is the best programming language for low-level work when the program needs to run as fast as possible. It takes a smarter approach to templates and safety than C++ without sacrificing anything. I just recently (a couple weeks ago) read http://www.amazon.ca/The-Programming-Language-Andrei-Alexandrescu/dp/0321635361 and started using D and already I think I'm in love. Java is by far the language in which I feel most comfortable writing code. I use it for all programming competitions (I like to compete in programming competitions. I've participated in the ACM a couple times and I made it to Round 3 of Google Code Jam last year). I realize that Java is primarily about portability, but I also like that it has the most extensive and well-documented standard library of any language, and I think eclipse is by far the best IDE for doing things quickly and cleanly. I also use python for scripting because when I just want an answer (eg. questions on projecteuler), it's much easier to work in a language with a powerful interpreter, first-class functions, and generators (and Python generators are just fun). Python is also my alternative to bash. I try to avoid bash scripting (because anything more than a simple process pipe is ugly, painful, and frustrating) in favor python scripts. I also find that ASP-Core-2 can often be handy for a lot of problems (using clingo to ground and solve).

  • Please describe your experience with being in a development team.

I have no serious experience working in a team. Every team project I've worked on in undergrad- or grad-school has been a frustrating disaster where at most two people had any clue what they were doing and nobody actually liked the project or would have chosen it if working alone. The two or three times I've had a chance to pair-code with someone who I really trust and respect as a programmer have been great and I think it's a wonderful way to work. But most people I talk to (even the ones I respect) seem to think pair-coding is a waste of time.
I look forward to an opportunity to work on a project with other strong programmers who really care about it.

  • Please describe the biggest project you have written code for and what you learned by doing so. Also describe your role in that project over time

The biggest project was certainly Gamescrafters. I was in this group in undergraduate for nearly three and a half years. I worked on writing generalized solver code (in Java) which could process "tiered" games in parallel using hadoop. For the last two years, there were other people working on the solvers side with me, but I was unable to properly explain how the codebase I'd written worked and so their projects never got off the ground.
On the one hand, I confess my documentation was absolutely terrible where it existed at all (today I believe I could do much better, I've had practice explaining things more completely and simply as a TA for CMPUT 101 and tend to get fairly positive reviews), but also these students were almost always freshman who had only just taken their first CS class. My approach to indexing game-states relied heavily on dynamic programming as well as results from combinatorics and generating functions and most of these students had just taken freshman calculus and hadn't expected to be doing any math.

  • Please state the commits and patches you already contributed to Octave.

I made a minor change to the m-script for the issymmetric function: http://savannah.gnu.org/bugs/?41426. I also created a significant patch to the find-function which consolidates all the many implementations into one templatized function. This fixes a couple separate bugs/matlab incompatibilities (find(sparse(0,0)), find(sparse(0,1)) etc. all work the same, six or more return values doesn't cause Octave to crash). Additionally, I added a fourth parameter for specifying the desired dimension of the output vector: http://savannah.gnu.org/patch/?8386 I've reported many bugs relating to sparse matrices.

F: Feeling fine[edit]

  • Please describe (in short) your experience with the following tools:
    • IRC and mailing lists

I'm on a few different mailing lists, but I still often forget to hit reply all and end up only sending my response to one person. I haven't used IRC

    • Mercurial or other source code management systems

I'm familliar with svn and git

    • Mediawiki or other wiki software

I know to put \<math\>\</math\> around LaTeX-formatted math expressions and I know how to write an expression in LyX and then copy-paste the generated LaTeX code.

    • make, gcc, gdb or other development tools

I know enough to make eclipse to handle all of that stuff for me (including add the right libraries and gcc configuration options etc. and stepping through execution when debugging). At the moment, I doubt I could do any significant work with those tools outside of eclipse.
I don't know how to write my own makefile

  • What will make you actively stay in our community after this GSoC is over?

Probably I will continue to commit changes as long as my research requires me to use Octave. Considering how useful it is, I expect that will be a lot.

O: Only out of interest[edit]

  • Did you ever hear about Octave before?

Yes

    • If so, when and where? How far have you been involved already?

I learned about Octave just over a year ago when I took a class in machine learning. The prof accepted homework assignments written in Matlab and Octave. I had experience with neither. However, I didn't really start to appreciate it until I was taking a class in PGM (Probabilistic Graphical Models) and one of the homework assignments was to identify whether two items in a Bayesian net were conditionally independent given an evidence set. I noticed that there was a really amazing efficient way to do that with matrix multiply operations after which I started trying to solve every graph problem in Octave.
This ultimately led to me submitting a barrage of bug-reports and feature requests about sparse matrices.

    • If not, where would you expect or advise us to do advertising?

I know that Matlab is still the de-facto standard which seems odd to me since Octave is free and (I think) a better language. I much prefer endif, endfor, endwhile, endfunction over end, end, end, and <nothing> for which I think the latter is a recipe for buggy code. When I talk to my friends and colleagues, they all seem to be under the impression that Octave is generally slower than Matlab (although I find no significant difference in efficiency). I think perhaps heavy m-script code can be slower, but not if it's well-vectorized. I see good vectorization as an enticing challenge whereas I think many people would rather just write their damn for-loops and get on with it. Perhaps the option to write oct-files needs to be advertised better. I only just found out about them.

  • What was the first question concerning Octave you could not find an answer to rather quickly?

I don't remember, but looking at my SO history, this was the first Octave-related question I asked:
http://stackoverflow.com/questions/14699667/why-does-diag-exhibit-inconsistent-behavior-in-octave
It turned out to be a bug and I think it's been fixed
Next is this one:
http://stackoverflow.com/questions/15393161/vector-unpacking-for-octave
and looking at it again I realize now that the answer given by yuk doesn't fully answer my question. A proper answer should mention this page of the Octave documentation:
https://www.gnu.org/software/octave/doc/interpreter/Comma-Separated-Lists.html
and probably quote the section: Comma separated lists cannot be directly manipulated by the user. However, both structure arrays and cell arrays can be converted into comma separated lists, and thus used in place of explicitly written comma separated lists.

P: Prerequisites[edit]

  • Please state the operating system you work with.

32-bit Ubuntu 12.04

    • If you have access to more than one, please state them and the conditions under which you are granted this access.

I have Windows 7 on the other partition, but I don't use it frequently and certainly would take some time figuring out how to write and execute code in Windows. (I believe I did it four or five years ago, but who can really remember back to the days before programmers universally used Ubuntu)

  • Please estimate an average time per day you will be able to access
    • an internet connection

Always, my phone has 3G.

    • a computer

I usually take my backpack with my laptop to the school, but when I don't, I can use the school computers.

    • a computer with your progressing work on

I commit all my work to a repo on bitbucket so I can access it from school computers as well. I've set up eclipse on one of them and eclipse tunnels surprisingly well over ssh -X, so that means any school computer is fine. If working on Octave the same would hold so long as I can create my own branch. I don't currently know how to do that in Mercurial but I'd be happy to learn.

  • Please describe the degree up to which you can install new software on computers you have access to.

I can install software on my laptop. To get it on the school computer I'm using, I have to email helpdesk and wait until Wednesday (that's when they do software updates) and that's only if it's available in the Ubuntu standard repo (however I have already done this for all of Octave's dependencies and set up the dev version of Octave in the scratch directory without root permission). But, in the worst case I also have an Amazon EC2 account I could use instead (a single EC2 instance is not expensive enough to warrant concern). I could probably set up everything on there as well.

S: Self-assessment[edit]

  • Please describe how useful criticism looks from your point of view as committing student.

I would welcome criticism. I know that I have a lot to work on as a programmer and I have virtually no experience documenting and testing code for a project intended for public consumption.
I've never received any significant review or criticism of my code, so I don't know for sure. There was a guy in the school's Programming Club (for competition programming) who used to set up practice competitions and look over our solutions when they didn't work (which indicates incredible dedication), but those were competition problems. They're a very different thing from production code.

  • How autonomous are you when developing?

I'm happy to work on my own. It's the only way I'm really used to working. But as I mentioned earlier, a couple times I've had the opportunity to pair-code with another strong programmer and I've found it to be more enjoyable. I've found I can write much faster with a peer looking over my shoulder since it relieves a lot of the worry that I'm inadvertantly introducing a bug whenever I write something. If I could work with another strong programmer, I'd love to, but I don't generally expect it.

    • Do you like to discuss changes intensively and not start coding until you know what you want to do?

No, I much prefer to jump right in and figure out the details as I go. I understand that having a roadmap can be important, but I think top-level functions serve as a much better, clearer roadmap than anything in English could. If the code really starts to look messy, I'm happy to scrap the whole thing and rewrite it. The real time-consuming work happens in your head, not on the keyboard. And I'm not scrapping that part.

    • Do you like to code a proof of concept to 'see how it turns out', modifying that and taking the risk of having work thrown away if it doesn't match what the project or original proponent had in mind?

Yeah, that sums it up pretty well.

Y: Your task[edit]

  • Did you select a task from our list of proposals and ideas?

Yes

    • If yes, what task did you choose? Please describe what part of it you especially want to focus on if you can already provide this information.

The Sparse Matrices Project: http://wiki.octave.org/Projects#Sparse_Matrices I want to help improve Sparse Matrix support. Particularly, I wanted to build a sparse matrix library tuned towards graph operations. I think that people often underestimate how many complex graph operations are possible with sparse matrices. In many cases, doing something like a breadth-first search can be equivalent to an operation which is almost just a matrix multiply.
On the mailing list, I gave an example of a program that takes an undirected graph and finds the set of all maximal cliques. As I've mentioned, I've done a lot with this sort of thing, but constantly find myself working around gaps or bugs. I want to try to fill those gaps and build this graph library on top of it.
UPDATE: After some discussion on the mailing list, one project Jordi suggested was just to get Octave Sparse-Matrix indexing working properly. This seems feasible since sparse column arrays are already stored in compressed format (and indexing generally requires converting back and forth between matrix and column format, but there shouldn't be any need to use row vectors). My first thought for how to approach this is simply to change the way sparse matrix dimensions are represented so that the vertical dimension is an unsigned long long int. If this turns out to be infeasible, I'll look at other possibilities.

  • Please provide a rough estimated timeline for your work on the task.

Honestly, the sparse-indexing problem doesn't sound like something that will take up three full months to complete, but I acknowledge that I haven't done much work on the Octave source yet, and can't properly judge this. If it doesn't, there are certainly many other sparse matrix features that have yet to be implemented which I would enjoy working on.

Codebase Consolidation[edit]

This is a list of things which I think need to be consolidated and cleaned up in the Octave codebase. Before embarking on my project, I'd feel obliged to handle each of these.

  • Permutation Matrices are not index vectors. They shouldn't subclass Array<octave_idx_type>. They should instead contain a private instance of Array<octave_idx_type>. Right now element_type and xelem are both wrong, and elem, checkelem, operator() all return octave_idx_type when they should return bool (for logical correctness) or double (for Matlab compatibility)
  • Diagonal matrices, permutation matrices and compressed column sparse matrices are all different types of sparse matrices. There needs to be an AbstractSparse class which is the base of DiagMatrix, PermMatrix and Sparse. In particular, all three types should have an nz_iter which iterates over its non-zero elements in column-major order.
  • Permuation Matrices should always be column-major. Every other matrix type in the entire code-base is. This bool _colp just adds unnecessary complication without granting any measurable performance boost except in extremely contrived situations.
  • The cumulative forms of max, min, all, any, sum, and product (and there should also probably be some sort of cum_xor) should all call the same function internally. They all are forms of reduce with a different binary operation. Right now, the implementations miss a lot of cases. Ex try:

max(sprand(1000000000, 20, 0.000001), [], 2);
This fails with an OOM because it wasn't considered when writing max

This sort of aggregate function (over the rows) can be done in O(min(h+nnz, nnz*log(w))) time, but one would have to implement the algorithm for each of the above-mentioned 7 functions. By consolidating these into one templated function, it'll make it worth the effort to implement.

  • There has to be a standard way to take an octave_value argument and call the proper template instantiation of a function based on its type. I started doing this with https://savannah.gnu.org/patch/?8417 , but diagonoal matrices still need to be added. The fact that the original implementer of find neglected to handle diagonal matrices should be evidence enough that the ad-hoc approach doesn't work.