Wednesday, January 27, 2010

SVN LoC and churn metrics


Wrote a small Perl script to grab svn lines of code metrics (added, modified, deleted) and churn (added + modified) metrics, as well as number of files added, updated, or deletes by revision. So you get a flat file output (fixed width) of timestamp, username, revision, lines added, modified, churned, deleted, files added, updated, deleted.



Using that raw data it's a quick thing to parse it any way you like, such as applying math to predict your defect rate or simply graphing it over time.



Here's the script, creatively named svnloc:



#!/usr/bin/perl
use strict;
use warnings;

my $BARSIZE = 40; # Size of the progress bar
my @statuses = qw(A U D);

my $repo = shift();
my $outfile = shift() || "./svnloc.txt";
my $revision = shift();
my $latest_rev;
my %rev_users;
my %rev_dates;
my %rev_changes;
my %rev_diff;

if (not defined $repo or not -e $repo) {
print <<END_USAGE;
Usage: svnloc repo [outfile [revision]]
repo the path to the svn repository
outfile the path for the output file, defaults to "./svnloc.txt"
revision if specified, will append data for that revision to the output
if not specified, all data for all revisions is obtained and the file
is generated from scratch, overwriting the old file if it exists.
END_USAGE
exit(1);
}

my $bl_filename = "svnloc.blacklist";
my @blacklist; # Don't count these revisions
if (-e "svnloc.blacklist") {
open BL, $bl_filename;
do { chomp; push @blacklist, $_; } for (<BL>);
close BL;
}

if (defined $revision) { #get info for our revision and append to output file
get_info($revision);
open OUTPUT, ">>$outfile";
output_line($revision);
close OUTPUT;
} else { # generate output file from scratch
my $history = `svnlook history $repo`;
($latest_rev) = $history =~ /(\d+)/s;
print "Latest revision: $latest_rev\n";

rev_loop("Obtaining revision information...",\&get_info);

open OUTPUT, ">$outfile";
printf OUTPUT ("%-20s%-18s%6s%7s%7s%5s%5s\n","Date","Username","Rev","Add","Mod","Chrn","Del",@statuses);

rev_loop("Generating outputfile ($outfile)...",\&output_line);

close OUTPUT;
}
print "Finished.\n";

sub get_info {
my $rev = shift;

my $info = `svnlook info -r $rev $repo`;
my ($user, $date) = split(/\n/,$info);
$rev_users{$rev} = $user;
$rev_dates{$rev} = $date;

my $changed = `svnlook changed -r $rev $repo`;
for my $s (split(/\n/,$changed)) {
my ($status) = substr($s,0,1);
$rev_changes{$rev}->{$status}++;
}

my $diff = `svnlook diff -r $rev $repo`;

my ($added,$modified,$deleted,$temp_deleted) = (0)x4;
for my $line (split(/\n/,$diff)) {
my $c2 = substr($line,0,2);
my ($c) = substr($line,0,1);
next if ($c2 eq '--' || $c2 eq '++'); # ignore header lines
if ($c eq '-') {
$temp_deleted++;
} elsif ($c eq '+') {
if ($temp_deleted) {
$temp_deleted--;
$modified++;
} else {
$added++;
}
} else {
$deleted += $temp_deleted;
$temp_deleted = 0;
}
}
$rev_diff{$rev}->{added} = $added;
$rev_diff{$rev}->{modified} = $modified;
$rev_diff{$rev}->{churn} = $added + $modified;
$rev_diff{$rev}->{removed} = $deleted;

}

sub output_line {
my $rev = shift;
no warnings 'uninitialized';
printf OUTPUT ("%20s%-18s%6d%7d%7d%7d%7d%5d%5d%5d\n",
substr($rev_dates{$rev},0,20),
$rev_users{$rev},
$rev,
$rev_diff{$rev}->{added},
$rev_diff{$rev}->{modified},
$rev_diff{$rev}->{churn},
$rev_diff{$rev}->{removed},
map { $rev_changes{$rev}->{$_} } @statuses);
}

sub rev_loop {
my ($msg, $code) = @_;
my $progress;
print "$msg\n";
start_progress(\$progress);
for (1..$latest_rev) {
tick_progress(\$progress,$latest_rev);
next if (is_in($_,@blacklist));
$code->($_);
}
end_progress();
}

sub start_progress {
my $progress = shift();
$$progress=0;
print "[" . (" " x $BARSIZE) . "]\r";
}

sub tick_progress {
my $progress = shift();
my $max = shift();
my $ticks = int(($$progress++/$max) * $BARSIZE);
my $spaces = $BARSIZE - $ticks;

printf "[" . ("=" x $ticks)
. (" " x $spaces)
. "] %-10s\r",$_;
}

sub end_progress {
print "[" . ("=" x $BARSIZE) . "]\n\n";
}

sub is_in {
my $item = shift;
my @list = @_;
my %seen;
@seen{@list} = (1) x scalar @list;
return $seen{$item};
}

Wednesday, November 11, 2009

Ghosts of the Past

Speaking very broadly, the applications I have written in the past have brought me good fortune. My customers are very happy with them, indeed have come to rely upon them, and they are very vocal about this with my management, leading to job security for me.

That's one side of things.

I didn't set out to make my code difficult to maintain, or to snare my customers in a dependence upon me in order to increase my worth. But I was a much younger programmer when I started out on these projects, and many mistakes were made. I'm not talking about bugs; no system exists that doesn't have its share of bugs, and whether my system has fewer or more than most is not relevant to the point I am trying to make. What's relevant is the lack of rigor in the process I used to maintain and extend these projects throughout their lifetimes.

One project in particular stands above the others in exemplifying the aspects I've described. It's known to my customers as "Remedy Web". Perhaps you're familiar with BMC Remedy Action Request System. My application is a Perl script providing a simplified front-end to the trouble ticket schema, using the ARSPerl API to do its work. Or at least, that's how it started its life.

Now, if you are familiar with the product I'm describing, you may be wondering why we didn't use the Web component that comes with ARS. The immediate answer is simple: The underlying schema was too complex, and the off-the-shelf web component exported this complexity out to the web, whereas we wanted to simplify. This raises the second question: Why not just simplify the schema?

The answer to this question is as simple as it sounds ridiculous: Because I did not have sufficient training and knowledge to do that. I took on a job maintaining the Remedy application based on my strength as a programmer, having had no training or experience with managing an ARS Schema or application. So, as they say, when all you have is a hammer... I took the challenge of simplifying the interface and beat it into submission using Perl/CGI and the ARSPerl API.

This worked surprisingly well, for some time. In fact, using Perl extensions, I was able to accomplish much which (at the time) Remedy corporation would have wanted to sell us as extra components. Email engine? Perl script on a cron job. Knowledge base integration? Javascript-based checklists. Client self-service, web submission, and customer satisfaction surveys? Another volley of Perl/CGI and MySQL. I even interfaced with our institution's central directory to identify our clients using Perl libraries lovingly crafted by some of my colleagues. And for reporting, should we train our people on Crystal Reports? Nah, more Perl/CGI.

And all of this was done without version control, unit tests, and almost all without proper bug tracking.

When all was said and done, I was able to accomplish everything that the customers wanted, and they were well pleased that I had not only met but exceeded their requirements in nearly every way.

Except for the fact that it broke sometimes.

After all those layers of complexity, what started as a 500-line Perl CGI script ended up as a 20KLOC array of CGI scripts, cron scripts, modules, Template Toolkit files, CSS and JavaScript. And the only one who knew how to keep the entire thing working properly was yours truly. So when it broke, word quickly propogated to me, and I fixed it. Even though sometimes it would break because of service failures in DB2 or the ARS server (maintained by another group). Still, everyone knew that if "Remedy Web" went down, they needed to get in touch with me right away.

But that wasn't the worst part. The worst part was that they took this system which was developed for them, and started to offer it (free of charge) to other groups in our institution for their use. Soon groups all over campus were all using this chimera of Perl, CGI, Javascript and ARS. Sure the needs of each group were distinct and unique, and some customization was needed, but in the end they were all willing to use this system because it was free, and because once they used it they could easily integrate with other groups also using it.

And amazingly enough, as of now, all of their tickets go into the same old schema we started with (with some minor additions), dated 2002. Of course, now we've got the system under a build, in SVN, and using a bug tracker (I'm still working on the unit tests ;)

Some time ago, I was promoted and am now in the position of senior programmer. I was instructed by administration to shed my responsibilities for programming the "Remedy Web" system and pass them on to someone else.

Now's my last chance.

If the system is handed over as-is, it will continue to be used, as-is, until it breaks very badly. Since I'm still close at hand, it will continue to be my task to keep it on life-support using a fraction of my powers, but I will no longer be able to devote the time needed to maintain it in perpetuity. Eventually, something will give, and they will either have to hire an ARS system expert, or they will have to buy into some other product and hire someone to maintain that.

Either that, or we can rebuild it now, while there is still time. I can empty my closet of this skeleton and hand off a better codebase. I'm sure it will be far from perfect, but it will be possible for another person to maintain it. I'll be able to work with my colleagues on a system that's collectively owned, and that we can take some measure of pride in together.

It will use a new schema. It'll do what the customers need it to do. And I'll finally be from the spectre of a product developed by a foolish young developer who couldn't say "No" to his customers, who thought "mv index.cgi index.cgi.old" was a versioning system, who thought "Bugzilla" was a B-movie, and who thought unit testing involved an oscilloscope (no offense to the hardware folks).

I know it's said to be a vice of programmers to want to throw away a project and start again once it's finally finished. But in this case, the only other choice is to wait until it's so far gone that it has to be thrown away. I feel that would be a waste for myself and my customers.

Anything worth doing once is worth doing twice to get it right.

After all, isn't that what major version numbers are for?

Wednesday, February 11, 2009

Vim Color Schemes

Since deciding to get into Vim, I seem to have wasted a considerable about of time on color schemes! After discovering how to get 256 colors in my terminal window (Hint: It works with PuTTY out of the box, just set your TERM environment variable to xterm-256color),  I decided to put together my own 256 color themes.  Well, sort of.  I actually just copied the code from the Wombat theme by Lars H. Nielsen and modified the colors.

This is the one I'm using by default, it's high-contrast and it seems to work very well for the Perl/Template/JS/HTML editing I am doing most of the time. On account of the many bright colors (which I'm sure other people will think look ridiculous), I call this scheme Harlequin.




Inspired by the green/brown/white colors deployed by the marketing droids of the on-site coffee vendors where I work, I have created this scheme named Starbucks.




And of course to round things out and bring some balance, here is this truly evil dark-side scheme which I call Magma.




Of course, there are those who say that Starbucks® is the true evil, but I digress...

Friday, February 6, 2009

SVN deletion goodness

The process of getting a project which was not under version control into SVN can be a chore.  Usually the lack of source control has forced the creation of loads of temporary and backup files with silly names.  The easiest thing to do is to simply import the whole mess into the repository, and then go back and clean it up later.

That's what I was doing earlier this morning.  I checked out a copy and started trimming, and by the time I was ready to commit, I realized I had been accidentally deleting files directly in the shell instead of using svn delete.  Oops!  Now I have to go back re-delete them.  But they're gone, and there were probably a hundred files and directories removed.  Won't that be a huge pain?

Not really.  A little shell one-liner will take care of it for you:

svn status | grep '^!' | awk '{print $2}' | xargs svn del

Run that from the root of your working copy, and it will do the following:

  • Give you the status of all the files and directories in your working copy compared with the repository.
  • Extract only those lines which start with !, which is svn status's way of saying "Oh noes, I can't find that one!"
  • Feed those lines into awk so that it can get the second item on the line, the path.
  • Use xargs to run svn del on each of those paths.

Now all of your deletes will be properly reflected in the repo at the next commit.  Phew!

Friday, January 16, 2009

Chat Hacking, Part II

So, got it to work.  Turns out we were both right:  Danny was correct in that we weren't using the JSJaC library properly, and I was right in that the server detected our switcheroo and didn't want to talk to us.

Firstly, we discovered there's an internal (but public) method on JSJaC's connection object called inherit, which allows you to utilize an existing http-bind session when you fire up the chat engine.  That turns out to be the right way to do things.  It expects a number of arguments passed in an argument object, three of which are vital to convincing the server that you are who you say you are: 

  • sid: (Session ID) This is generated by the server and sent in the connection phase, we already had this working fine in Perl, so no problems here.
  • key: The key is a hex-encoded sha1 hash sequence, and it's used to verify that each subsequent request comes from the same client.  How?  Well, each new key is the sha1 hash of the previous key.  If you transmit the wrong key, the server barfs on you.
  • rid: (Request ID) This is simply a sequential number, but it's important with respect to the key.  If your request ID is not in lock-step with your key sequence, the server again will barf on you.

Getting the key right was the trickiest part, but not too bad.  Essentially, JSJaC by default generates a list of 16 keys at a time to use.  Since we weren't initializing the session using JSJaC, we instead had Perl initialize those keys and use the first few to establish a connection.  The key list then gets injected in to the web page where JSJaC can pick it up.  As long as there are no fencepost errors, the whole sequence proceeds along without a hitch, and the server happily talks to the JavaScript.

It's actually not as fragile as we were afraid it might be, since the session/key/rid setup makes sure that each session is unique and can only be utilized in JS by the CGI that initiated it.  And it makes sure that all the credentials required for login are safely tucked away on the server side, where clients can't see them at all.  Now all we have to do is tweak (read: restrict or rewrite) the JWChat interface a little to give it fewer features, and add a bit of conversation logging, and we've basically accomplished what we set out to do.  

Which means we can have anonymous clients talking to our people on our internal chat server, in a controlled environment and without compromising any credentials.

Thursday, January 15, 2009

Chat Hacking

In a blog post by the same name on our (internal) bug tracking system, my colleague Danny describes our (thus far futile) efforts to create an unholy union between JWChat and a cgi script.  That's right, we're trying to do XMPP Chat via the web, which means we're trying to utilize something which implements XEP-0124 (BOSH).

The point is that we want outsiders to be able to talk to our people who are using the internal XMPP server.  Trouble is, the XMPP server requires local credentials to log in (LDAP), so we need them to log in via dummy accounts.  But, we don't want to just hand out the user/pass to those dummy accounts by putting them into the JavaScript source of a web-based chat client.  Hence, the madness begins.

One of our system administrators had the lovely idea of using a CGI script to initiate the connection, keeping our credentials server-side, over an http-bind proxy.  Then we'd hand the session data to our JavaScript client-side, again through the proxy, hopefull unbeknownst to the chat server.  

Yeah, it's not going so well.  As in, not at all.  For one thing, there are no existing libraries to implement XEP-0124 in Perl.  We can do regular socket connections, sure, but not BOSH.  So, we were faced with either using a library from another language (like JavaScript or C++) inside a Perl wrapper, or just "faking" the BOSH process by sending some pre-formatted XML over an LWP connection to the http-bind port.  

Well, that part actually worked (suprisingly enough), we can connect server side and get session info.  But as of right now, we can't inject that info into the JavaScript client side and have it pick up the ball, so to speak.  I think maybe the server somehow can tell that we've pulled a fast one on it, and it's not willing to talk to the client masquerading as the server.   Danny thinks we probably just haven't covered all of our bases in initializing the JavaScript chat engine.  We're proceeding under the assumption that he's right, and it's still possible to get this to work.

Only time will tell if this crazy scheme of ours can work.

Tuesday, January 13, 2009

Design by Contract

So, trying to look into Design by Contract, the virtues of which are described by the Pragmatic Programmers, numerous stackers, and others.  But my inquiry on StackOverflow has yet to yield any answers.  

I'm contemplating just trying out Moose, and writing some extension module for it to support the concepts behind DbC, but since I'm new to both Moose and DbC, I don't know that it'd be worthwhile.  If I do, I'll post my plan here.  But in the meanwhile, I hope that somebody can give me some kind of answer!