Oct 032011
 

Pulling and pushing with git can be a bit verbose.  This post explains how to get from git pull --rebase origin master and git push origin master to just typing git pull and git push.

Rebase

First, set rebase for every new upstream branch (Why rebase? It makes your history easier to understand.)

git config --global branch.autosetuprebase always

This is explained in more detail (and with other helpful hints) in Mislav Marohnić’s post A few git tips you didn’t know about.

Tracking

Second, make git push only send the current branch to its matching upstream (aka tracking) branch. (Otherwise the default behavior is to push all branches that have the same name on both ends.)

git config --global push.default upstream

This is covered in some detail in Mark Longhair’s post An asymmetry between git pull and push.

On any existing branches, you can set up tracking by doing an explicit push:

git push -u origin branchname

Default refspec

At this point you should be set according to all the tutorials I came across. In my experience, however, this only works for branches other than master. A plain git push on master yields the error:

fatal: The current branch master has multiple upstream branches, refusing to push.

The solution is to set the default refspec for git push. I’m unclear on why this needed for master but not for other branches.

git config remote.origin.push HEAD

 Tagged with:
Jun 152011
 

Using a bookmarklet to store passwords is appealingly simple. Alas, after doing some digging, I couldn’t find any viable options.

The first concern I came across is that it is important to use a hash algorithm that’s slow (e.g. bcrypt or scrypt). Otherwise it’s too easy to brute-force the master password based on a site password. Suppose a site you visit stores your password in plaintext and gets hacked. That breach then compromises your master password, even though only your site-specific password was revealed.

I couldn’t find a JavaScript implementation of scrypt, but I found a JavaScript bcrypt implementation. Better yet, I found a derivative that tidies up the first one, removing dependencies on e.g. ClipperZ, and wraps it in a simple bookmarklet. SuperGenPass provides a much more user-friendly bookmarklet, so I started gearing up to replace it’s MD5 hashing with bcrypt.

But, alas, SuperGenPass (and any other simple bookmarklet) is not secure in the face of a malicious website that contains JavaScript designed to sniff entry of the master password into the bookmarklet. PwdHash is a browser extension based approach from the Stanford Security Lab designed to combat the weaknesses of the bookmarklet based approach. Their paper, Stronger Password Authentication Using Browser Extensions, is interesting reading and explains a variety of ways to compromise a bookmarklet based approach. PwdHash has already spawned a number of ports to other browsers and mobile devices, but alas they’re all based on prototype code that uses the undesirably fast HMAC-MD5 as the hashing algorithm (even though the paper points out PwdHash is a good candidate for a better hashing algorithm).

I was not able to find any PwdHash derivative that used bcrypt. I did find a simple command-line tool based on scrypt, but that’s not great if you don’t have easy access to your own computer.

Solutions like PassPack offer the potential to solve these problems (extension rather than bookmarklet, use of strong encryption rather than weak hashing), but have an Achilles heel of their own: the service provider has the power to decrypt all your passwords. For now I’ll stick with my moinmoin-client-crypt approach.

UPDATE 2012-05-19: PassPack does not store your packing key on their servers afterall. (LastPass does not either, nor does Clipperz.) But you still must trust them, as they are in a position to insert backdoors into either the browser add-ons or web-based access they provide. This is less of an issue with ClipperZ, since you can run the Community Edition on your own hardware. Some brief comparisons here and here. Also there is some interesting discussion in the comments of the previously linked PassPack critique. Gabriel Weinberg has LastPass amongst his list of services used at DuckDuckGo. LastPass did possibly have a data breach, but they handled it well. Some more details on PassPack’s packing keys and master keys.

Jun 052011
 

Quick refresher: self-types are commonly used when writing traits that want to proscribe that they get mixed in to a particular class. For example, the cake-pattern leverages them. In the example below, FooTrait specifies a self-type of FooTraitConfiguration to insure that it is mixed in to a class that provides the expected times val.

import actors.Actor
import actors.Actor._

trait FooTraitConfiguration { val times : Int }

trait FooTrait { self:FooTraitConfiguration =>
  case object Ping
  case object Pong

  val a = actor {
    loop {
      react {
        case Ping =>
          self ! Pong
        case Pong =>
          for(_ <- (1 to times)) { print(".") }
          System.out.println("pong.")
  } } }

  def ping = a ! Ping
  def pong = a ! Pong
}

class Foo extends FooTrait with FooTraitConfiguration { @Override val times = 5 }

But, alas, this fails to compile:

error: value ! is not a member of FooTrait with FooTraitConfiguration
self ! Pong

It seems that the self-trait has broken the Actor API! And indeed, it has. Because self-traits are not usually specified with self! It should have been:
this:FooTraitConfiguration =>

The self-type means that within FooTrait the type of this is considered to be FooTrait with FooTraitConfiguration. Using a word other than this additionally sets up an alias to that type for e.g. use within nested classes. And there’s the rub: Actors depend on a method named self which is shadowed when the alias to the type is named self.

Note to self: Don’t use self when specifying self types!

Apr 122011
 

Yesterday I needed to decipher a log file in which a dozen threads were simultaneously logging messages. Surely there must be tools for this out there. But I couldn’t find one, so I wrote a Python script to indent each line differently based on thread id. I then looked at it in LibreOffice, but just reading on the terminal would have sufficed. Here’s a trivial demo:

2011-04-11 09:40:12,004 [INFO] [http-3] Hello
2011-04-11 09:40:13,554 [DEBUG] [http-1] Wikipedia
(pronounced /ˌwɪkɨˈpiːdi.ə/ WIK-i-PEE-dee-ə)
2011-04-11 09:40:13,605 [INFO] [http-2] PCC Natural Markets
2011-04-11 09:40:13,688 [INFO] [http-3] World
2011-04-11 09:40:14,015 [INFO] [http-2] began as a food-buying club of 15 families in 1953.
2011-04-11 09:40:16,032 [INFO] [http-1] is a multilingual, web-based, free-content encyclopedia project based on an openly editable model
2011-04-11 09:40:17,775 [INFO] [http-2] Today, it's the largest consumer-owned natural food retail co-operative in the United States.

becomes:

09:40:12,004	Hello
09:40:13,554		Wikipedia
09:40:13,554		(pronounced /ˌwɪkɨˈpiːdi.ə/ WIK-i-PEE-dee-ə)
09:40:13,605			PCC Natural Markets
09:40:13,688	World
09:40:14,015			began as a food-buying club of 15 families in 1953.
09:40:16,032		is a multilingual, web-based, free-content encyclopedia project based on an openly editable model
09:40:17,775			Today, it's the largest consumer-owned natural food retail co-operative in the United States.
            	http-3	http-1	http-2

Just read down the columns vertical for a clear chain of events on each thread. Code below is maintained on GitHub.

#!/usr/bin/env python

# Looks at token in a particular position in each line and indents the line
# differently for each unique identifier found in the file. For example, given
# a log file which contains a thread identifier, contents for each thread will
# be separated out into distinct columns.
#
# Lines not matching the pattern (e.g. stack traces) are presumed to have
# occurred at the time of and belong to the same identifier as the preceding line.
#
# Default pattern is: <date> <stamp> ignored [thread_id] <message>
# Yielding output: <stamp><tabs><message>
#
# An alternate regular expression can be supplied on the command line; it must
# include named capture groups 'stamp', 'id', and 'message'. The default regex
# is: ^\S+ (?P<stamp>\S+) \S+ \[(?P<id>[^\]]+)\] (?P<message>.*)
#
# If the input contains very long lines it can be helpful to truncate them
# beforehand by e.g. piping through awk '{print substr($0,0,400)}'

import sys,re
if len(sys.argv) > 1:
  pattern = re.compile(sys.argv[1])
else:
  pattern = re.compile('^\S+ (?P<stamp>\S+) \S+ \[(?P<id>[^\]]+)\] (?P<message>.*)')

delimiter='\t'
max_level = 1
categories = {}
legend = None
indent = ""
stamp = ""

try:
  for line in [l.strip() for l in sys.stdin]:
    m = pattern.match(line)
    if m:
      stamp,identifier,message = [m.group(x) for x in ['stamp','id','message']]
      indent = categories.get(identifier)
      if not legend:
        legend = " " * len(stamp)
      if not indent:
        indent = delimiter * max_level
        categories[identifier] = indent
        max_level += 1
        legend += delimiter + identifier
      print stamp + indent + message
    else:
      # carry over stamp and indent from previous line
      print stamp + indent + line

  print legend
except IOError:
  pass
Apr 122011
 

Grepping through log files for lines that match a timestamp is fiddly. It’s hard to catch multi-line entries (e.g. stack traces) and to craft a regex that captures an exact time range. I wrote a little Python script to simpify the process.

Usage:

 <example.log| ./by_time.py 9:40 9:44:15

Code below is maintained on GitHub.

#!/usr/bin/env python

# Selects time range from a log file. Lines with no time (e.g. stack traces)
# are presumed to have occurred at the time of the preceding line.
#
# Assumes first time-like phrase on a line is the timestamp for that line.
#
# Assumes time format is pairs of digits separated by colons with optional , or
# . initiated suffix. E.g. HH:mm:ss,SSS, HH:mm, etc.
#
# Does not strip blank lines; just use awk 'NF>0' for that.

import sys,re
time_pattern = re.compile("(?:^|.*?\D)(\d{1,2}(?::\d{2})+(?:[,.]\d+)?)")
fields_pattern = re.compile("[:,.]")

if len(sys.argv) < 3:
  print >> sys.stderr, "Please specify start and end times (e.g. %s 13:50 14:10:01,101)." % sys.argv[0]
  exit(1)

for item,index in [["start time",1],["end time",2]]:
  if not time_pattern.match(sys.argv[index]):
    raise ValueError("Cannot parse %s: %s" % (item, sys.argv[index]))

start,end = [[int(x) for x in re.split(fields_pattern, s)] for s in sys.argv[1:3]]
too_soon = True

try:
  for line in sys.stdin:
    line = line.strip()
    m = time_pattern.match(line)
    if m:
      t = [int(x) for x in re.split(fields_pattern,m.group(1))]
      if t >= end:
        break
      elif too_soon and t >= start:
        too_soon = False

    if not too_soon:
      print line
except IOError:
  pass
Apr 042011
 

I tend to have lots of browser tabs open. Even more so with Firefox 4’s Panorama feature, which I find handy despite the frustrating limitation that tab groups cannot be moved between windows. Alas, having many tabs open is suspected to trigger/exacerbate a Firefox memory leak when Adblock Plus is running. My own informal testing corroborates this.

In Firefox 3.6 I had used BarTab, a great plug-in that allowed unloading tabs without closing them to reclaim memory. Alas, it is not yet compatible with Firefox 4.

But it turns out Firefox 4 has a new setting (called Cascaded Session Restore) that provides a reasonable work-around. In about:config set

browser.sessionstore.max_concurrent_tabs=0

…and then, when closed and re-opened, Firefox will not load any tab until you click on it. Now it’s extremely fast to quit, restart, and continue on with reduced memory usage.

Mar 312011
 

moinmoin-client-crypt was the fun part of a recent Wiki migration project I did. The tedious prelude was getting the content out of an aging JSPWiki version and into MoinMoin.

After some aborted attempts at translating the JSPWiki source from scratch, I decided the path of least resistance would be to leverage HTML::WikiConverter to translate the HTML output of JSPWiki. This turned out to be time consuming as well. To anyone else going down this path I offer up:

  1. A patched version of HTML::WikiConverter-MoinMoin that includes fixes for intra-wiki links, inline images, horizontal rules, and definition lists.
  2. A collection of scripts, dubbed JSPWiki-translate-perl, for retrieving HTML from JSPWiki, pre-processing it to make it more palatable for HTML::WikiConverter, and for generating a MoinMoin-style directory layout to contain it.

The original author of HTML::WikiConverter-MoinMoin seems to have abandoned it. I can sympathize; I certainly hope to avoid translating another Wiki any time soon!

Mar 302011
 

I just posted the first release of moinmoin-client-crypt to GitHub. As way of introduction, here’s an excerpt from the readme:

moinmoin-client-crypt provides client-side encryption/decryption of MoinMoin wiki pages (or portions thereof). It adds encrypt/decrypt buttons to the edit screen, providing an easy mechanism to secure all or a portion of the content. Encryption is via Chris Veness’ Javascript AES implementation (256 bit key, CTR mode).

Installation involves dropping a couple JavaScript files into the appropriate MoinMoin directory and tweaking the theme init file to reference them. Full functionality with modern and classic themes, perhaps slightly degraded on others. It shouldn’t take much tweaking to adapt to other themes; patches and bug reports are welcome!

The client-side JavaScript approach provides some security if the server were to be seized: the AES ciphertext should be extremely difficult to crack. Also, once the browser is closed on the client side, there should be no trace left of the plaintext. However, if the server were compromised it would be easy to replace moinmoin-client-crypt with a trojan horse if a malicious person were to gain control of the client, they could easily install e.g. a keylogger you have to trust your client machine, your browser, your connection to the server, and the integrity of the server, as explained here by Nate Lawson. The need for client-side security should be obvious; the server and connection must be trusted not to send/inject a modified version of the script.