Analysing code repositories

I want to compare activity of non-paid developers in a project before and after the introduction of financial rewards. For this I’ll look at commits as a measure of how active people are.

Commits are a useful measurement here as it shows real activity, other activity happens as well (mailing lists, IRC), but code written is commonly looked at as shorthand for how active somebody is. Commits show real activity rather then self-reported activity, there’s less (social) lying.  They can be analyzed statistically. Most importantly for me, as these projects are currently paying developers, the data exists going back years to a time before payment.

The data can be gamed, after all humans will perform to the standard used to measure them. However, this data has been created already and gaming previously created data is harder (probably too hard for most people to care).

Methods of repository analysis:

  • a social network for open source developers. They analyze a lot of OSS projects down to the individual contributor level (even across projects) in several kinds of repositories but data is only available for the last 300 days (due to performance issues)
  • analyzes svn repositories and generates nice images out of it. Creates a nice graph, but haven’t found the data itself yet, of #commits per developer over time going back as far as available in the local svn repository. Requires access to the repository and only does subversion.
  • has some screens to see statistical graphs, same issues as Statsvn above but compounded by it not being a statistical tool, so even less likely to have the data lying around easily.
  • works on subversion repositories (and other non-versioning system things) but seems to only be in very early stage development.
  • Looks similar to statsvn but for cvs
  • htp:// Another subversion analysis program, not the most intuitive charts, but information dense. Most output is ascii based

