Aug 172012

Solr has a handy ability to ingest local CSV files. The neatest aspect of which is that you can populate multi-valued fields by sub-parsing an individual field. E.g. the following will ingest /tmp/input.csv and split SomeField into multiple values by semi-colon delimiters:

curl http://localhost:80/solr/my_core/update\?stream.file\=/tmp/input.csv\&stream.contentType\=text/csv\;charset\=utf-8\&commit\=true\&f.SpmeField.split\=true\&f.SomeField.separator\=%3B

When running an ingest, I got the following response, which was confusing since myField was, in fact, defined in my schema:

<?xml version="1.0" encoding="UTF-8"?>
    <lst name="responseHeader">
        <int name="status">400</int>
        <int name="QTime">1</int></lst>
        <lst name="error">
            <str name="msg">undefined field: "myField"</str>
        <int name="code">400</int>

A peek in the log provided a clue (note the leading question mark):

SEVERE: org.apache.solr.common.SolrException: undefined field: "?myField"

Examining a hex dump of the CSV file revealed that it started with a UTF-8 Byte Order Mark:

xxd /tmp/input.csv | head
0000000: efbb bf...

One way to strip the BOM is with Bomstrip, a collection of BOM-stripping implementations in various languages, including a Perl one-liner. Alternatively, just open the file in Vim, do :set nobomb and save. Done!

Mar 022012

Starting with OS X Lion, holding down a key will bring up a menu of alternate characters rather than repeating the key. (This is a feature). There are many tips on how to re-enable key-repeat globally. But you can also control the behavior per-application (thanks, Egor Ushakov). This is handy for e.g. IntelliJ or RubyMine, or any other app that provides Vim-style keyboard bindings. The magic commands are:

% defaults write com.jetbrains.intellij ApplePressAndHoldEnabled -bool false
% defaults write com.jetbrains.rubymine ApplePressAndHoldEnabled -bool false

But how do you figure out what the magic identifier for your application is? Simple: defaults domains will list them all:

defaults domains | gsed -e 's/, /\n/g' | grep jetbrains

Note that in order to munge the commas into newlines for grep, gsed was required because OS X default sed cannot (easily) insert newlines.