{"id":16,"date":"2011-04-12T23:59:45","date_gmt":"2011-04-13T06:59:45","guid":{"rendered":"http:\/\/www.lorrin.org\/blog\/?p=16"},"modified":"2012-03-16T23:28:38","modified_gmt":"2012-03-17T06:28:38","slug":"untangling-multi-threaded-log-files","status":"publish","type":"post","link":"https:\/\/www.lorrin.org\/blog\/2011\/04\/12\/untangling-multi-threaded-log-files\/","title":{"rendered":"Untangling multi-threaded log files"},"content":{"rendered":"<p>Yesterday I needed to decipher a log file in which a dozen threads were simultaneously logging messages. Surely there must be tools for this out there. But I couldn&#8217;t find one, so I wrote a Python script to indent each line differently based on thread id. I then looked at it in <a href=\"http:\/\/www.libreoffice.org\/\">LibreOffice<\/a>, but just reading on the terminal would have sufficed. Here&#8217;s a trivial demo:<\/p>\n<pre>2011-04-11 09:40:12,004 [INFO] [http-3] Hello\r\n2011-04-11 09:40:13,554 [DEBUG] [http-1] Wikipedia\r\n(pronounced \/\u02ccw\u026ak\u0268\u02c8pi\u02d0di.\u0259\/ WIK-i-PEE-dee-\u0259)\r\n2011-04-11 09:40:13,605 [INFO] [http-2] PCC Natural Markets\r\n2011-04-11 09:40:13,688 [INFO] [http-3] World\r\n2011-04-11 09:40:14,015 [INFO] [http-2] began as a food-buying club of 15 families in 1953.\r\n2011-04-11 09:40:16,032 [INFO] [http-1] is a multilingual, web-based, free-content encyclopedia project based on an openly editable model\r\n2011-04-11 09:40:17,775 [INFO] [http-2] Today, it's the largest consumer-owned natural food retail co-operative in the United States.<\/pre>\n<p>becomes:<\/p>\n<pre>09:40:12,004\tHello\r\n09:40:13,554\t\tWikipedia\r\n09:40:13,554\t\t(pronounced \/\u02ccw\u026ak\u0268\u02c8pi\u02d0di.\u0259\/ WIK-i-PEE-dee-\u0259)\r\n09:40:13,605\t\t\tPCC Natural Markets\r\n09:40:13,688\tWorld\r\n09:40:14,015\t\t\tbegan as a food-buying club of 15 families in 1953.\r\n09:40:16,032\t\tis a multilingual, web-based, free-content encyclopedia project based on an openly editable model\r\n09:40:17,775\t\t\tToday, it's the largest consumer-owned natural food retail co-operative in the United States.\r\n            \thttp-3\thttp-1\thttp-2<\/pre>\n<p>Just read down the columns vertical for a clear chain of events on each thread. Code below is <a href=\"https:\/\/github.com\/lorrin\/misc-tools-scripts\/blob\/master\/token_indenter.py\">maintained on GitHub<\/a>.<\/p>\n<pre class=\"brush:py\">#!\/usr\/bin\/env python\r\n\r\n# Looks at token in a particular position in each line and indents the line\r\n# differently for each unique identifier found in the file. For example, given\r\n# a log file which contains a thread identifier, contents for each thread will\r\n# be separated out into distinct columns.\r\n#\r\n# Lines not matching the pattern (e.g. stack traces) are presumed to have\r\n# occurred at the time of and belong to the same identifier as the preceding line.\r\n#\r\n# Default pattern is: &lt;date&gt; &lt;stamp&gt; ignored [thread_id] &lt;message&gt;\r\n# Yielding output: &lt;stamp&gt;&lt;tabs&gt;&lt;message&gt;\r\n#\r\n# An alternate regular expression can be supplied on the command line; it must\r\n# include named capture groups 'stamp', 'id', and 'message'. The default regex\r\n# is: ^\\S+ (?P&lt;stamp&gt;\\S+) \\S+ \\[(?P&lt;id&gt;[^\\]]+)\\] (?P&lt;message&gt;.*)\r\n#\r\n# If the input contains very long lines it can be helpful to truncate them\r\n# beforehand by e.g. piping through awk '{print substr($0,0,400)}'\r\n\r\nimport sys,re\r\nif len(sys.argv) &gt; 1:\r\n  pattern = re.compile(sys.argv[1])\r\nelse:\r\n  pattern = re.compile('^\\S+ (?P&lt;stamp&gt;\\S+) \\S+ \\[(?P&lt;id&gt;[^\\]]+)\\] (?P&lt;message&gt;.*)')\r\n\r\ndelimiter='\\t'\r\nmax_level = 1\r\ncategories = {}\r\nlegend = None\r\nindent = \"\"\r\nstamp = \"\"\r\n\r\ntry:\r\n  for line in [l.strip() for l in sys.stdin]:\r\n    m = pattern.match(line)\r\n    if m:\r\n      stamp,identifier,message = [m.group(x) for x in ['stamp','id','message']]\r\n      indent = categories.get(identifier)\r\n      if not legend:\r\n        legend = \" \" * len(stamp)\r\n      if not indent:\r\n        indent = delimiter * max_level\r\n        categories[identifier] = indent\r\n        max_level += 1\r\n        legend += delimiter + identifier\r\n      print stamp + indent + message\r\n    else:\r\n      # carry over stamp and indent from previous line\r\n      print stamp + indent + line\r\n\r\n  print legend\r\nexcept IOError:\r\n  pass<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Yesterday I needed to decipher a log file in which a dozen threads were simultaneously logging messages. Surely there must be tools for this out there. But I couldn&#8217;t find one, so I wrote a Python script to indent each line differently based on thread id. I then looked at it in LibreOffice, but just <a href='https:\/\/www.lorrin.org\/blog\/2011\/04\/12\/untangling-multi-threaded-log-files\/' class='excerpt-more'>[&#8230;]<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[48],"tags":[12,17,13,11,16],"_links":{"self":[{"href":"https:\/\/www.lorrin.org\/blog\/wp-json\/wp\/v2\/posts\/16"}],"collection":[{"href":"https:\/\/www.lorrin.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.lorrin.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.lorrin.org\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.lorrin.org\/blog\/wp-json\/wp\/v2\/comments?post=16"}],"version-history":[{"count":25,"href":"https:\/\/www.lorrin.org\/blog\/wp-json\/wp\/v2\/posts\/16\/revisions"}],"predecessor-version":[{"id":47,"href":"https:\/\/www.lorrin.org\/blog\/wp-json\/wp\/v2\/posts\/16\/revisions\/47"}],"wp:attachment":[{"href":"https:\/\/www.lorrin.org\/blog\/wp-json\/wp\/v2\/media?parent=16"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.lorrin.org\/blog\/wp-json\/wp\/v2\/categories?post=16"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.lorrin.org\/blog\/wp-json\/wp\/v2\/tags?post=16"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}