You've been owned
Mar 282013
 

If you use the timestamptz data type, Postgres does timezone conversions automatically.

First, some test data:

pg=> create table time_test (id text, stamp timestamptz);
CREATE TABLE
pg=> insert into time_test values('foo', now());
INSERT 0 1
pg=> insert into time_test values('foo', now());
INSERT 0 1
pg=> select * from time_test;
id | stamp
-----+-------------------------------
foo | 2013-01-22 00:53:40.325041+00
foo | 2013-01-22 00:54:02.021018+00
(2 rows)

Client-supplied data data in other timezones is automatically converted for comparisons:

pg=> select * from time_test where stamp > '2013-01-21 16:54:00 PST';
id | stamp
-----+-------------------------------
foo | 2013-01-22 00:54:02.021018+00
(1 row)

Results can be converted on the fly:

pg=> select id, stamp at time zone 'PST' from time_test;
id | timezone
-----+----------------------------
foo | 2013-01-21 16:53:40.325041
foo | 2013-01-21 16:54:02.021018
(2 rows)

…once, or for the whole session.

pg=> set session time zone "pst8pdt";
SET
pg=> select * from time_test;
id | stamp
-----+-------------------------------
foo | 2013-01-21 16:53:40.325041-08
foo | 2013-01-21 16:54:02.021018-08
(2 rows)

pg=> insert into time_test values ('bar', '2013-01-21 16:55:03');
INSERT 0 1
pg=> select * from time_test;
id | stamp
-----+-------------------------------
foo | 2013-01-21 16:53:40.325041-08
foo | 2013-01-21 16:54:02.021018-08
bar | 2013-01-21 16:55:03-08
(3 rows)

 

Nov 302012
 

While working on a deployment process that automatically updated an ElasticIP to point to a new instance, I got to see a lot of these:

Offending key in /Users/lhn/.ssh/known_hosts:45
RSA host key for xxx.yyy.zzz has changed and you have requested strict checking.
Host key verification failed.

Here is a sed one-liner to delete offending key (on line 45 in this case) from SSH’s known_hosts file.  This is a reasonable thing to do when you know why the host key has changed and don’t expect it to do so very often.

sed -i -e '45d' ~/.ssh/known_hosts

-i is for in-place editing and -e provides the expression, which is to delete line 45.

However, sometimes you expect the host key to change frequently and a better approach to not check or store the host key in the first place. That can be achieved as follows (kudos Peter Leung)

ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no user@some.host

This tells SSH to use always-empty /dev/null as its place to record host keys and to not complain when connecting to host with an unknown key. Thus no host keys are stored or checked.

At the risk of stating the obvious, this does of course side-step SSH’s ability to protect you from man-in-the-middle attacks.

Oct 252012
 

The other day I got a low disk space warning because my Thunderbird Inbox had grown to over 100 GB. It turned out my Inbox, Trash, and Sent mailbox folders were all impacted by some bug in which Thunderbird would keep fetching the same messages again and again from the server (IMAP) and appending them to the mailbox file. Compacting the mailbox would recover the disk space, but the mailboxes would start growing again shortly thereafter.

The magic incantation to resolve the problem was some quick succession of compacting the mailbox (right-click -> Compact) and repairing it (right-click -> Properties... -> Repair Folder).  I did have Preferences -> Advanced -> Network & Disk Space -> Compact all folders when it will save over 1 MB in total set, but it wasn’t kicking in.

Oct 112012
 

Google Music Manager uploads are based on looking for music files in a particular directory. This isn’t helpful if you have a large directory structure of music and want to upload a subset of it. In my case, I want to use Banshee’s smart playlist feature to select songs to upload. Fortunately Banshee has a .m3u playlist export, but this is only half the battle. The other half is to use symlinks to fool Google Music Manager into thinking the songs in the playlist are in its directory.

The following shell command does the trick. It takes input lines in the .m3u of the form <path>/<artist>/<album>/<song> (e.g. ../../../mnt/onion/media/Music/Banshee/Wir Sind Helden/Soundso/01. (Ode) An Die Arbeit.mp3) and makes symlinks of the form <artist>_<album>_<song>.

cat ~/my_playlist.m3u | ruby -ne 'IO.popen(["ln", "-s", "#{$&}", "./#{$2[0..50]}_#{$3[0..50]}_#{$4[0..50]}.#{$5}"]) if $_.strip =~ /^([^#].*)\/([^\/]*)\/([^\/]*)\/([^\/]*)\.([^.\/]*)$/'

For each line ($_) that matches the pattern (not starting with #, having at least the expected number of slashes), execute: ln -s <input line ($&)> <composed filename> . The [0..50] ranges keep filename length manageable.

Aug 172012
 

Solr has a handy ability to ingest local CSV files. The neatest aspect of which is that you can populate multi-valued fields by sub-parsing an individual field. E.g. the following will ingest /tmp/input.csv and split SomeField into multiple values by semi-colon delimiters:

curl http://localhost:80/solr/my_core/update\?stream.file\=/tmp/input.csv\&stream.contentType\=text/csv\;charset\=utf-8\&commit\=true\&f.SpmeField.split\=true\&f.SomeField.separator\=%3B

When running an ingest, I got the following response, which was confusing since myField was, in fact, defined in my schema:

<?xml version="1.0" encoding="UTF-8"?>
<response>
    <lst name="responseHeader">
        <int name="status">400</int>
        <int name="QTime">1</int></lst>
        <lst name="error">
            <str name="msg">undefined field: "myField"</str>
        <int name="code">400</int>
    </lst>
</response>

A peek in the log provided a clue (note the leading question mark):

SEVERE: org.apache.solr.common.SolrException: undefined field: "?myField"

Examining a hex dump of the CSV file revealed that it started with a UTF-8 Byte Order Mark:

xxd /tmp/input.csv | head
0000000: efbb bf...

One way to strip the BOM is with Bomstrip, a collection of BOM-stripping implementations in various languages, including a Perl one-liner. Alternatively, just open the file in Vim, do :set nobomb and save. Done!

Aug 092012
 

Given a working Perl 5.12 install (via MacPorts), doing a sudo port install perl5.16 does not update the perl symlink:

% ls -alF /opt/local/bin/perl
lrwxr-xr-x 1 root admin 8 Jun 6 15:19 /opt/local/bin/perl@ -> perl5.12

The magic incantation is to install a the perl5_16 variant of the perl5 package:

sudo port install perl5 +perl5_16

With this done, the symlink is updated and perl loads the expected version.

% ls -alF /opt/local/bin/perl                                                                                                    
lrwxr-xr-x  1 root  admin  8 Aug  9 12:45 /opt/local/bin/perl@ -> perl5.16
% perl -v
This is perl 5, version 16, subversion 0 (v5.16.0) built for darwin-thread-multi-2level
Aug 092012
 

Stackable traits in Scala refers to being able to mix in multiple traits that work together to apply multiple modifications to a method. This involves invoking super.theMethod and modifying its input and/or output. But what is super in the context of a trait? The class (or trait) the trait extends from? The class the trait is being mixed into? It depends! All the mixed in traits and all the superclasses are linearized. super invokes the nearest preceding definition further up the chain. The general effect is that mixins to the right (and their ancestor classes) come earlier than those to the left. However, ancestors that are shared are deduped to only show up once, and they show up as late as possible. Here’s a detailed description of the Scala object hierarchy linearization algorithm.

If a trait which extends MyInterface tries to invoke super.myMethod but MyInterface.myMethod is abstract, the compiler generates this error:

error: method myMethod in trait MyInterface is accessed from super. It may not be abstract unless it is overridden by a member declared `abstract' and `override'

What this means is: generally, invoking an abstract method of a superclass is an error. However, with traits, the meaning of super is not known at compile time. The call would be valid if the trait were mixed into a class that had an implementation of the method. But the compiler errs on the side of caution unless told otherwise. abstract override def myMethod provides signals that you expect an implementation of the method to be available at run-time and to not treat the super.myMethod invocation as an error. (Note: this applies regardless of whether the trait itself provides an implementation of the method.)

Here are some examples:

trait Munger {
  def munge(l : List[String]) : List[String]
}

trait Replace1 extends Munger {
  override def munge(l : List[String]) = l :+ "Replace1"
}

trait Replace2 extends Munger {
  override def munge(l : List[String]) = l :+ "Replace2"
}

//abstract override def munge required in the Stack* classes because they invoke
//abstract super.munge

trait Stack1 extends Munger {
  abstract override def munge(l : List[String]) = super.munge(l) :+ "Stack1"
}

trait Stack2Parent extends Munger
  abstract override def munge(l : List[String]) = super.munge(l) :+ "Stack2Parent"
}

trait Stack2 extends Stack2Parent {
  abstract override def munge(l : List[String]) = super.munge(l) :+ "Stack2"
}

class Bottom {
  this : Munger =>

  def apply() {
    println(
      munge(List("bottom"))
    )
  }
}

scala> (new Bottom with Replace1)()
List(bottom, Replace1)

scala> (new Bottom with Replace1 with Replace2)()
List(bottom, Replace2) //Replace1's munge was overridden and never ran

scala> (new Bottom with Replace1 with Stack1)()
List(bottom, Replace1, Stack1) //Stack1 called super.munge, which invoked the
//munge from the trait to the left

scala> (new Bottom with Replace1 with Stack2)()
List(bottom, Replace1, Stack2Parent, Stack2) //Stack2's super.munge called to its
//superclass, whereas Stack2Parent's super.munge called the trait to the left
Jul 312012
 

Trying to unit test some code that was to run inside Solr, I bumped into this:

Cannot mock/spy class org.apache.solr.core.SolrCore
Mockito cannot mock/spy following:
  - final classes
  - anonymous classes
  - primitive types

Fortunately, there’s a simple solution: PowerMock. After adding the following two annotations to my test class definition (and the requisite Maven dependency declarations), everything just worked. No changes needed to the actual Mockito calls themselves. Sweet.

@RunWith(PowerMockRunner.class)
@PrepareForTest( { SolrCore.class })
Jul 032012
 

When trying to use Apache Pig in local mode to connect to a stand-alone HBase using HBaseStorage, I kept getting errors like this:

ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2999: Unexpected internal error. Not a host:port pair: ?42548@endive.local10.1.10.70,    64058,1341349176322

The unrecognized host:port pair corresponds to a happy sign-on message from the HBase log:

INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier of this process is 42548@endive.local

The problem is a version mismatch: HBase apparently changed the format of this data in 0.92. As of Pig 0.10.0, the solution is downgrade HBase to 0.90.6.