lankycoder

Receive files via Bluetooth on Ubuntu

lorrin — Thu, 08 Jan 2015 21:39:44 +0000

By default, nothing happens when you try to send a file to your Ubuntu machine via Bluetooth. Pairing doesn’t help. I found the solution at Tech Areena:

1) Launch the Personal File Sharing settings, which can by found via a search in the Unity Dash.

2) Enable receipt of files via Bluetooth.

One-liner SSH via jump box using ProxyCommand

lorrin — Sat, 11 Jan 2014 06:16:04 +0000

There are quite a few posts out there on how to make multi-hop SSH easier. Often this is called SSH’ing via jump box or proxy host.

Most of them work via netcat (nc), which is a bit finicky. A better, less mentioned, option is the SSH’s -W flag. Implemented in your ~/.ssh/config, it looks like this:

Host my_server
  IdentityFile server_key.pem
  HostName 172.31.4.82
  User username
  ProxyCommand ssh -i key_for_jumpbox.pem -W %h:%p jumpbox_user@jump.box.host

Now just ssh my_server and you’re off to the races! For a quick-n-dirty one-liner without editing your SSH config, it looks like this:

ssh -i server_key.pem -o "ProxyCommand ssh -W %h:%p -i key_for_jumpbox.pem jumpbox_user@jump.box.host" username@172.31.4.82

A very clever solution described on the Gentoo Wiki enables a simple syntax: ssh host1+host2. But it gets uglier with differing usernames: ssh user1%host1+host2 -l user2. Also it uses netcat rather than -W and doesn’t appear to play nicely with needing to specify key files with -i. A little monkeying could solve those problems. A project for a future day.

On a another note, I find it useful to alias ssh_unsafe and scp_unsafe as follows:

alias ssh_unsafe="ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no"
alias scp_unsafe="scp -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no"

Handy when connecting to a box for which you do not care to remember or verify the host key.

“mysql_config not found” installing MySQL-python with MacPorts

lorrin — Wed, 16 Oct 2013 01:14:49 +0000

UPDATE 2013-10-16: Macports now has a mysql_select package that cleanly solves this problem. Run the following and then pip will be able to find mysql_config without issue.

sudo port install mysql_select
sudo port select mysql mysql56

Using pip to install MySQL-python (aka MySQLdb) gives the error “EnvironmentError: mysql_config not found” when run on a system where MySQL has been installed via MacPorts. The solution is to tell the installer where mysql_config can be found by appending “mysql_config = /opt/local/bin/mysql_config5” to site.cfg. Assuming use of virtualenvwrapper (highly recommended!):

pip install --no-install MySQL-python
... ignore errors ...
VENV=$(dirname $(dirname $(which python))); echo "mysql_config = /opt/local/bin/mysql_config5" >> $VENV/build/MySQL-python/site.cfg
pip install --no-download MySQL-python

virtualenvwrapper recipe

lorrin — Wed, 16 Oct 2013 01:13:18 +0000

Virtualenvwrapper is a great way to manager Python environments. This is a quick cheatsheet for using it.

Setup

Get Python

On OS X:
MacPorts>

sudo port install python27
sudo port select python python27

Install standard packages

 sudo easy_install pip
sudo pip install virtualenv
sudo pip install virtualenvwrapper
sudo pip install yolk

Configure virtualenvwrapper

In your .zshrc, .bashrc, etc, add:

source $(dirname $(which python))/virtualenvwrapper.sh

In your .zshenv, .bashrc, etc., add:

export WORKON_HOME=~/.virtualenvs

Create .virtualenvs directory:

 mkdir $WORKON_HOME

List available environments

workon

Make an environment

Note: environments are stored in your ~/$WORKON directory and you can issue these commands from anywhere.

mkvirtualenv myproject

Select an environment

workon myproject

Within an environment, pip install packages as usual.

See the Virtualenvwrapper docs for more information.

GNU find on OS X

lorrin — Sat, 10 Aug 2013 06:57:42 +0000

The GNU version of find has some nice features (like -readable). You can install it with MacPorts using sudo port install findutils. However, this installs it as gfind. There are some references on the web to using the +with_default_names variant to avoid this, but a quick check of port variants findutils reveals that there is no such variant. This is intentional. The new approach is that /opt/local/libexec/gnubin/ contains symlinks with the native names. So add this directory to your path as well and you’re all set.

For more GNU goodness, check out the md5sha1sum package (for md5sum) as well as coreutils.

Configuring Apache for Perfect Forward Secrecy

lorrin — Thu, 04 Jul 2013 05:57:01 +0000

I had trouble finding a good recipe for Apache SSL configuration that achieves perfect forward secrecy while avoiding other pitfalls such as the BEAST attack, so I made my own.

First, SSLv2 is vulnerable, so disable it. On my Ubuntu box this was already done in ssl.conf:

# enable only secure protocols: SSLv3 and TLSv1, but not SSLv2
SSLProtocol all -SSLv2

TLSv1 is widely supported, so it makes sense to include -SSLv3 as well.

Second, tell the browser to pay attention to the order ciphers are specified in:

SSLHonorCipherOrder On

Next, compose the cipher list. The BEAST attack against how SSLv3 and TLSv1.0 do cipher block chaining makes most of the otherwise good ciphers (e.g. AES) vulnerable, leaving only the weaker RC4 as a viable option for those protocols. That is easier said than done, since Apache doesn’t allow conditional cipher list control based on protocol, and one can’t simply disable those protocols because browser support for TLS v1.1 and higher is still weak. As a proxy for checking the protocol version I therefore I resort to preferring ciphers that were only introduced after TLSv1.0. TLSv1.1 didn’t introduce anything new, but TLSv1.2 added new hashing algorithms (AEAD, SHA384, SHA256; prior to that, AES was only available with SHA1 hashing). Thus the first organizational principal of the list is: TLSv1.2 and above, followed by RC4, followed by older protocols.

Perfect forward secrecy is achieved by using ephemeral Diffie-Hellman (EDH). Ephemeral elliptic-curve Diffie-Hellman (EECDH) is reasonably fast, so I prefer it. Otherwise EDH is slow; consider omitting if you’re serving a lot of traffic on limited hardware. Thus the second organizational principal is: use each cipher only in combination with EECDH or plain EDH. (But prefer to relinquish perfect forward secrecy before being vulnerable to BEAST.)

Finally, for good hygiene, explicitly disable anything using no authentication (!aNULL), no or weak encryption (!eNULL, !EXP, !LOW), or weak hashing (!MD5)

The recipe thus is:

SSLCipherSuite EECDH+AES:EDH+AES:-SHA1:EECDH+RC4:EDH+RC4:RC4-SHA:EECDH+AES256:EDH+AES256:AES256-SHA:!aNULL:!eNULL:!EXP:!LOW:!MD5

The syntax for the recipe is the same as for the openssl ciphers command. Of note, the leading “-” in -SHA1 means remove any ciphers with SHA1 hashing that had been previously added, whereas RC4-SHA is just the name of a particular cipher.

Unfortunately, older versions of Apache might not include all of these. E.g. Apache 2.2 on Ubuntu 12.04 LTS lacks EECDH (and there is no EDH RC4 variant). Thus in practice most browsers would use RC4 without perfect forward secrecy (but at least no BEAST vulnerability). The solution is to get a newer version of Apache, either by waiting for Ubuntu 13.10 obtaining it elsewhere. Configuration can be tested easily via SSLLabs.

Update 2013-11-09:

I’ve found a few alternate recommendations around the web. They put less emphasis on BEAST protection (perhaps wise; BEAST is mostly mitigated client-side now) and more emphasis on perfect forward secrecy. To varying degrees they also have stronger preferences for GCM and greater reluctance to accept RC4.

Of particular note are, I think, the following recommendations:

Personally, I’m going to go with Mozilla OpSec’s. Their reasoning is well explained on their page. Of note, they prefer AES128 over AES256. In their words: “[AES128] provides good security, is really fast, and seems to be more resistant to timing attacks.”

Noteworthy in Ivan Ristic’s and Geoffroy Gramaize’s recommendation is that SSLv3 is disabled. I think this mostly just breaks IE6, though some security related differences between SSLv3 and TLS v1.0 are mentioned on Wikipedia.

Also before I didn’t talk about CRIME and BREACH. To protect against CRIME, disable SSL compression. This is included in the examples linked. To protected against BREACH, you need to disable compression at the HTTP level. For Apache 2.4, just do this once globally:


  SetEnvIfExpr "%{HTTPS} == 'on'" no-gzip

For older versions of Apache, place this in each VirtualHost where SSLEngine is on:


    SetEnv no-gzip

Custom Joda-Time DateFormatter in Jackson

lorrin — Sat, 29 Jun 2013 06:52:56 +0000

Here is how to customize how Jackson serializes Joda-Time dates to JSON:

objectMapperFactory.registerModule(new SimpleModule() {
    {
        addSerializer(DateTime.class, new StdSerializer(DateTime.class) {
            @Override
            public void serialize(DateTime value, JsonGenerator jgen, SerializerProvider provider) throws IOException, JsonGenerationException {
                 jgen.writeString(ISODateTimeFormat.date().print(value));
            }
        });
    }
});

You can use this in combination with JodaModule, just place it after the JodaModule is registered.

Alternatively, if all you need is to write DateTimes in ISO 8061 format instead of as Unix epochs, you can use the following:

objectMapperFactory.registerModule(new JodaModule())
objectMapperFactory.disable(SerializationFeature.WRITE_DATES_AS_TIMESTAMPS);

JodaModule registers a custom DateTimeSerializer that takes the setting into account. However, unlike the standard Java Date implementation, SerializationFeature.WRITE_DATE_KEYS_AS_TIMESTAMPS and getSerializationConfig().setDateFormat(myDateFormat) are ignored, so there is no way to fine-tune the serialization.

Ultimately a more elegant solution would be to give JodaModule some additional constructors or setters that allow passing in a DateFormatter that its various helper classes would use.

Postgres timezone handling

lorrin — Fri, 29 Mar 2013 06:23:32 +0000

If you use the timestamptz data type, Postgres does timezone conversions automatically.

First, some test data:

pg=> create table time_test (id text, stamp timestamptz);
CREATE TABLE
pg=> insert into time_test values('foo', now());
INSERT 0 1
pg=> insert into time_test values('foo', now());
INSERT 0 1
pg=> select * from time_test;
id | stamp
-----+-------------------------------
foo | 2013-01-22 00:53:40.325041+00
foo | 2013-01-22 00:54:02.021018+00
(2 rows)

Client-supplied data data in other timezones is automatically converted for comparisons:

pg=> select * from time_test where stamp > '2013-01-21 16:54:00 PST';
id | stamp
-----+-------------------------------
foo | 2013-01-22 00:54:02.021018+00
(1 row)

Results can be converted on the fly:

pg=> select id, stamp at time zone 'PST' from time_test;
id | timezone
-----+----------------------------
foo | 2013-01-21 16:53:40.325041
foo | 2013-01-21 16:54:02.021018
(2 rows)

…once, or for the whole session.

pg=> set session time zone "pst8pdt";
SET
pg=> select * from time_test;
id | stamp
-----+-------------------------------
foo | 2013-01-21 16:53:40.325041-08
foo | 2013-01-21 16:54:02.021018-08
(2 rows)

pg=> insert into time_test values ('bar', '2013-01-21 16:55:03');
INSERT 0 1
pg=> select * from time_test;
id | stamp
-----+-------------------------------
foo | 2013-01-21 16:53:40.325041-08
foo | 2013-01-21 16:54:02.021018-08
bar | 2013-01-21 16:55:03-08
(3 rows)

Cannot see cursor in vi-mode with oh-my-zsh on Ubuntu

lorrin — Tue, 26 Mar 2013 05:57:36 +0000

I’m not sure who is to blame here, but using vi-mode in oh-my-zsh in Gnome Terminal on Ubuntu, I find that when I move from word to word, the cursor vanishes briefly. This negates the efficiency of trying to edit with vi keybindings! Simple work-around: install roxterm.

Avoiding SSH host key verification failures

lorrin — Sat, 01 Dec 2012 04:55:37 +0000

While working on a deployment process that automatically updated an ElasticIP to point to a new instance, I got to see a lot of these:

Offending key in /Users/lhn/.ssh/known_hosts:45
RSA host key for xxx.yyy.zzz has changed and you have requested strict checking.
Host key verification failed.

Here is a sed one-liner to delete offending key (on line 45 in this case) from SSH’s known_hosts file. This is a reasonable thing to do when you know why the host key has changed and don’t expect it to do so very often.

sed -i -e '45d' ~/.ssh/known_hosts

-i is for in-place editing and -e provides the expression, which is to delete line 45.

However, sometimes you expect the host key to change frequently and a better approach to not check or store the host key in the first place. That can be achieved as follows (kudos Peter Leung)

ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no user@some.host

This tells SSH to use always-empty /dev/null as its place to record host keys and to not complain when connecting to host with an unknown key. Thus no host keys are stored or checked.

At the risk of stating the obvious, this does of course side-step SSH’s ability to protect you from man-in-the-middle attacks.

Thunderbird mailboxes growing without bounds

lorrin — Fri, 26 Oct 2012 01:15:59 +0000

The other day I got a low disk space warning because my Thunderbird Inbox had grown to over 100 GB. It turned out my Inbox, Trash, and Sent mailbox folders were all impacted by some bug in which Thunderbird would keep fetching the same messages again and again from the server (IMAP) and appending them to the mailbox file. Compacting the mailbox would recover the disk space, but the mailboxes would start growing again shortly thereafter.

The magic incantation to resolve the problem was some quick succession of compacting the mailbox (right-click -> Compact) and repairing it (right-click -> Properties... -> Repair Folder). I did have Preferences -> Advanced -> Network & Disk Space -> Compact all folders when it will save over 1 MB in total set, but it wasn’t kicking in.

Upload .m3u playlist to Google Music

lorrin — Fri, 12 Oct 2012 05:46:58 +0000

Google Music Manager uploads are based on looking for music files in a particular directory. This isn’t helpful if you have a large directory structure of music and want to upload a subset of it. In my case, I want to use Banshee’s smart playlist feature to select songs to upload. Fortunately Banshee has a .m3u playlist export, but this is only half the battle. The other half is to use symlinks to fool Google Music Manager into thinking the songs in the playlist are in its directory.

The following shell command does the trick. It takes input lines in the .m3u of the form /// (e.g. ../../../mnt/onion/media/Music/Banshee/Wir Sind Helden/Soundso/01. (Ode) An Die Arbeit.mp3) and makes symlinks of the form __.

cat ~/my_playlist.m3u | ruby -ne 'IO.popen(["ln", "-s", "#{$&}", "./#{$2[0..50]}_#{$3[0..50]}_#{$4[0..50]}.#{$5}"]) if $_.strip =~ /^([^#].*)\/([^\/]*)\/([^\/]*)\/([^\/]*)\.([^.\/]*)$/'

For each line ($_) that matches the pattern (not starting with #, having at least the expected number of slashes), execute: ln -s . The [0..50] ranges keep filename length manageable.

Stripping Unicode Byte Order Mark to resolve SolrException: undefined field: “?myField” during Ingest

lorrin — Fri, 17 Aug 2012 21:38:33 +0000

Solr has a handy ability to ingest local CSV files. The neatest aspect of which is that you can populate multi-valued fields by sub-parsing an individual field. E.g. the following will ingest /tmp/input.csv and split SomeField into multiple values by semi-colon delimiters:

curl http://localhost:80/solr/my_core/update\?stream.file\=/tmp/input.csv\&stream.contentType\=text/csv\;charset\=utf-8\&commit\=true\&f.SpmeField.split\=true\&f.SomeField.separator\=%3B

When running an ingest, I got the following response, which was confusing since myField was, in fact, defined in my schema:



    
        400
        1
        
            undefined field: "myField"
        400

A peek in the log provided a clue (note the leading question mark):

SEVERE: org.apache.solr.common.SolrException: undefined field: "?myField"

Examining a hex dump of the CSV file revealed that it started with a UTF-8 Byte Order Mark:

xxd /tmp/input.csv | head
0000000: efbb bf...

One way to strip the BOM is with Bomstrip, a collection of BOM-stripping implementations in various languages, including a Perl one-liner. Alternatively, just open the file in Vim, do :set nobomb and save. Done!

Setting default package versions with MacPorts

lorrin — Thu, 09 Aug 2012 19:54:40 +0000

Given a working Perl 5.12 install (via MacPorts), doing a sudo port install perl5.16 does not update the perl symlink:

% ls -alF /opt/local/bin/perl
lrwxr-xr-x 1 root admin 8 Jun 6 15:19 /opt/local/bin/perl@ -> perl5.12

The magic incantation is to install a the perl5_16 variant of the perl5 package:

sudo port install perl5 +perl5_16

With this done, the symlink is updated and perl loads the expected version.

% ls -alF /opt/local/bin/perl                                                                                                    
lrwxr-xr-x  1 root  admin  8 Aug  9 12:45 /opt/local/bin/perl@ -> perl5.16
% perl -v
This is perl 5, version 16, subversion 0 (v5.16.0) built for darwin-thread-multi-2level

Scala’s abstract override, stackable traits, and object hierarchy linearization

lorrin — Thu, 09 Aug 2012 17:34:46 +0000

Stackable traits in Scala refers to being able to mix in multiple traits that work together to apply multiple modifications to a method. This involves invoking super.theMethod and modifying its input and/or output. But what is super in the context of a trait? The class (or trait) the trait extends from? The class the trait is being mixed into? It depends! All the mixed in traits and all the superclasses are linearized. super invokes the nearest preceding definition further up the chain. The general effect is that mixins to the right (and their ancestor classes) come earlier than those to the left. However, ancestors that are shared are deduped to only show up once, and they show up as late as possible. Here’s a detailed description of the Scala object hierarchy linearization algorithm.

If a trait which extends MyInterface tries to invoke super.myMethod but MyInterface.myMethod is abstract, the compiler generates this error:

error: method myMethod in trait MyInterface is accessed from super. It may not be abstract unless it is overridden by a member declared `abstract' and `override'

What this means is: generally, invoking an abstract method of a superclass is an error. However, with traits, the meaning of super is not known at compile time. The call would be valid if the trait were mixed into a class that had an implementation of the method. But the compiler errs on the side of caution unless told otherwise. abstract override def myMethod provides signals that you expect an implementation of the method to be available at run-time and to not treat the super.myMethod invocation as an error. (Note: this applies regardless of whether the trait itself provides an implementation of the method.)

Here are some examples:

trait Munger {
  def munge(l : List[String]) : List[String]
}

trait Replace1 extends Munger {
  override def munge(l : List[String]) = l :+ "Replace1"
}

trait Replace2 extends Munger {
  override def munge(l : List[String]) = l :+ "Replace2"
}

//abstract override def munge required in the Stack* classes because they invoke
//abstract super.munge

trait Stack1 extends Munger {
  abstract override def munge(l : List[String]) = super.munge(l) :+ "Stack1"
}

trait Stack2Parent extends Munger
  abstract override def munge(l : List[String]) = super.munge(l) :+ "Stack2Parent"
}

trait Stack2 extends Stack2Parent {
  abstract override def munge(l : List[String]) = super.munge(l) :+ "Stack2"
}

class Bottom {
  this : Munger =>

  def apply() {
    println(
      munge(List("bottom"))
    )
  }
}

scala> (new Bottom with Replace1)()
List(bottom, Replace1)

scala> (new Bottom with Replace1 with Replace2)()
List(bottom, Replace2) //Replace1's munge was overridden and never ran

scala> (new Bottom with Replace1 with Stack1)()
List(bottom, Replace1, Stack1) //Stack1 called super.munge, which invoked the
//munge from the trait to the left

scala> (new Bottom with Replace1 with Stack2)()
List(bottom, Replace1, Stack2Parent, Stack2) //Stack2's super.munge called to its
//superclass, whereas Stack2Parent's super.munge called the trait to the left

Mocking Java final classes

lorrin — Tue, 31 Jul 2012 17:57:19 +0000

Trying to unit test some code that was to run inside Solr, I bumped into this:

Cannot mock/spy class org.apache.solr.core.SolrCore
Mockito cannot mock/spy following:
  - final classes
  - anonymous classes
  - primitive types

Fortunately, there’s a simple solution: PowerMock. After adding the following two annotations to my test class definition (and the requisite Maven dependency declarations), everything just worked. No changes needed to the actual Mockito calls themselves. Sweet.

@RunWith(PowerMockRunner.class)
@PrepareForTest( { SolrCore.class })

Apache Pig “Not a host:port pair” errors using HBase

lorrin — Tue, 03 Jul 2012 22:43:02 +0000

When trying to use Apache Pig in local mode to connect to a stand-alone HBase using HBaseStorage, I kept getting errors like this:

ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2999: Unexpected internal error. Not a host:port pair: ?42548@endive.local10.1.10.70,    64058,1341349176322

The unrecognized host:port pair corresponds to a happy sign-on message from the HBase log:

INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier of this process is 42548@endive.local

The problem is a version mismatch: HBase apparently changed the format of this data in 0.92. As of Pig 0.10.0, the solution is downgrade HBase to 0.90.6.

VNC to OS X Lion: Gratuitous Passwords and Empty Grey Linen Desktops

lorrin — Thu, 29 Mar 2012 06:36:51 +0000

There are two tricks to using VNC from a non Mac to connect to a Mac running OS X Lion.

Turn on the VNC server by enabling System Preferences -> Sharing -> Screen Sharing. Even though it provides little security, a VNC password must be set so that OS X will present an authentication scheme that makes sense to a standard VNC client. Enable “VNC viewers may control screen with password”
After connecting, you will see a grey linen-backgrounded desktop with nothing in it. Type your user name and password. After logging in, your desktop contents will display!

“ERROR: cannot execute UPDATE in a read-only transaction”

lorrin — Mon, 12 Mar 2012 20:56:02 +0000

"ERROR: cannot execute UPDATE in a read-only transaction" is what PostgreSQL tells you when you try to execute an update on a replica instead of on the master.

Browsing Without Being Tracked

lorrin — Sat, 10 Mar 2012 19:03:13 +0000

TL;DR: Install Do Not Track Plus, use Duck Duck Go (with !sp sometimes) for web searches., To go the extra mile also install Straight Google (requires Greasemonkey), Cookie Whitelist and BetterPrivacy.

I don’t like the idea of advertisers, search engines, and social networks building extensive profiles about what I do online (why). A short-list of tools to avoid such tracking:

Prevent Inter-Website Tracking

Abine’s Do Not Track Plus is nearly a one-stop shop. I wish more details were available about what it does, but the gist is:

Install and maintain a large number of generic do-not-track-me cookies for many ad networks and tracking services. When content is fetched from these sites, the generic cookie is sent rather than one which is unique to you
Special handling for social buttons (e.g. Like this on Facebook), in which the button is fetched anonymously, but, should you choose to click on it, the veil is lifted and the Like associated with your account
Many ads are blocked from rendering too, which I hadn’t expected. Those that remain are innocuous enough that I do not use Ad Block Plus any more.

Reduce Google Information Gathering

I store some personal information on Google (thanks to Google+, Google Calendar, etc.). I do not want to Google to associate that personal information with all the web searches I do every day. Do Not Track Plus is of limited value here: if you sign in to Google, Do Not Track Plus will be obliged to permit your identity to be sent. Additional steps are needed:

Don’t search with Google. I prefer Duck Duck Go for most searches, thanks to their Zero-click Info and other goodies.
For needle-in-the-haystack searches, I find Google often has the best results. Startpage is an anonymous Google Search proxy. Rather than use it directly, I just prefix my Duck Duck Go searches with !sp when needed.
Straight Google (requires Greasemonkey) prevents Google’s click-tracking. This is less important if you follow the above steps to avoid doing your web searches at google.com. However, they still track links clicked on their other products, which Straight Google can prevent.

Control Intra-Website Tracking

The above steps should take care of attempts to track your movement across the web. However, most websites will still store long-term cookies in your browser to track your history of interaction with that particular website.

Cookie Whitelist is designed to only allow white-listed cookies from being accepted. In practice, this breaks too many websites. For less hassle, configure as follows:

Cookie button (the red one): ON. This lets any website set a cookie, but it will be deleted at the end of the session
For the few websites you wish to remain logged in to (or otherwise personalized) click the green button to whitelist as needed
Do not accept third-party cookies

BetterPrivacy is to Flash LSOs (local shared objects, or Flash cookies) what Cookie Whitelist is to regular cookies. Alas with a more confusing set of configuration options.

Note: this post is (obviously?) not about how to avoid your employer/ISP/government monitoring what you do online. To hide what you are doing from someone who has access to all your traffic, you need encryption and proxying. A good first stop to get some encryption is EFF’s HTTPS Everywhere. This goes a long way to prevent the person nearby in the coffee shop from stealing your Facebook account.

Originally published 2012-03-10. Updated 2012-03-14 with intra-website tracking steps.

Google’s Click Tracking and How to Disable It

lorrin — Sat, 10 Mar 2012 18:05:07 +0000

By default, Google tracks every search result you click on. They do this surreptitiously: URLs in Google search results appear to go directly to the destination:

But, upon click, URLs in Google search results change to go to Google first!

Straight Google removes this tracking from Google URLs across all Google products. Easy to install and no configuration needed, but you must install Greasemonkey first.

Fixing key-repeat in OS X Lion (and restoring sanity for Vim keybindings) on a per-application basis

lorrin — Sat, 03 Mar 2012 05:06:15 +0000

Starting with OS X Lion, holding down a key will bring up a menu of alternate characters rather than repeating the key. (This is a feature). There are many tips on how to re-enable key-repeat globally. But you can also control the behavior per-application (thanks, Egor Ushakov). This is handy for e.g. IntelliJ or RubyMine, or any other app that provides Vim-style keyboard bindings. The magic commands are:

% defaults write com.jetbrains.intellij ApplePressAndHoldEnabled -bool false
% defaults write com.jetbrains.rubymine ApplePressAndHoldEnabled -bool false

But how do you figure out what the magic identifier for your application is? Simple: defaults domains will list them all:

defaults domains | gsed -e 's/, /\n/g' | grep jetbrains
com.jetbrains.intellij
com.jetbrains.intellij.ce
com.jetbrains.rubymine
jetbrains.communicator.core

Note that in order to munge the commas into newlines for grep, gsed was required because OS X default sed cannot (easily) insert newlines.

Inserting newlines with sed on OS X

lorrin — Sat, 03 Mar 2012 04:58:46 +0000

Out of the box, inserting newlines does not work with sed on OS X:

% echo foo,bar,baz | sed -e 's/,/\n/g'            
foonbarnbaz

The simple solution is to use GNU sed, which is already installed (as gsed), instead of the default BSD version:

% echo foo,bar,baz | gsed -e 's/,/\n/g'           
foo
bar
baz

Alternatively, it is possible (though fiddly) to trick BSD sed into inserting newlines using extquotes.

Random file selections with Python

lorrin — Tue, 21 Feb 2012 18:28:50 +0000

After my previous adventures in slicing and dicing a huge XML file, I wanted a means to randomly select files. But first, the directory had so many entries it was unwieldy on my laptop. The Python script below divvies the files up into directories of up to 1000 files each. (Adaptable to other contexts via slight tweaking of the filename regex and subdir name generation.)

#!/usr/bin/python
import os
import re
where = '.' # source directory 

ls = os.listdir(where)
for f in ls:
  m = re.search('.*_COMM-([0-9]+).xml', f)
  if m:
    subdir = "%03d" % (int(m.group(1)) / 1000)
    try:
      os.mkdir(subdir)
    except OSError as e:
      pass
    os.rename(f, os.path.join(subdir, f))

Now on to the random selection, again with Python:

#!/usr/bin/python
import os
import random
import re
import sys

if len(sys.argv) > 1:
  where = sys.argv[1]
else:
  where = '.' # source directory 


subdirs = filter(lambda x: re.search('^[0-9]*$', x), os.listdir(where))
subdir = os.path.join(where,random.choice(subdirs))
print os.path.join(subdir,random.choice(os.listdir(subdir)))

A quick shell loop leverages the Python script to grab files and dump into a repository of test data. Works on ZSH, Bash, perhaps others:

for i in {1..250}; do cp $(./pick_a_file.py sub_dir_with_files) /destination/dir/filename_prefix_$(printf "%03d" $i).xml; done;

Thunderbird still displays expunged IMAP messages

lorrin — Wed, 15 Feb 2012 06:11:12 +0000

There is lots to be said about the intricacies of IMAP delete flags vs. actual expunging of deleted messages and the confusion caused when something is merely flagged for deletion and the user expected it to be really gone. This post is not about that. Everyone agrees that once a message is expunged, it definitely should be gone. But sometimes expunged messages still display in Thunderbird!

I often observe this:

Delete message on the way to work using K-9 on my phone.
Arrive at work and message is gone from my Inbox in Mail.app
Come home, download new mail in Thunderbird and see an Inbox full of undead messages.

No amount of re-expunging and re-fetching mail helps. Grepping through the server-side Maildir shows the messages really are gone from the folders in which Thunderbird is still showing them.

It turns out the reason they are still displaying in Thunderbird is mundane client-side index corruption. To clean things up:

Right-click on mailbox
Choose Properties...
Click Repair Folder
Rejoice at tidy mailbox