Better Programming

Advice for programmers.

How to Build A Text Filtering, Log Simplifying Tool in Go

Stephen Wayne
Better Programming
Published in
6 min readAug 8, 2022

--

This is part one of a four part series (some still in progress). You can find the others here:

What Are We Building?

In software engineering, we often parse log files to understand the internal state of a program. This can be tedious in the presence of frequent-but-irrelevant (noisy) logs, especially when debugging as a distributed team. To that end, I wrote a simple program in Go to remove any lines of text that contain a substring (or one of a series of substrings) and generate a new file that contains all but the matches. This has proved invaluable over the years — both for my sanity and for the sanity of those who pick up debugging the system where I left off.

Standard tools can help filter for existing log lines, but this program allows you to do the opposite — iterate through a file, removing noisy logs as you find them!

Designing the Tool

Before we start coding, let’s consider how one might use this tool. Given that this is a fairly technical product and only needs a simple interface, a command line interface (CLI) seems appropriate.

What about the inputs? We’ll need to provide the file source, the key phrases to cut out of that source, and whether or not we want the file to be edited in place. An optional enhancement could be to provide a path for the output file if not edited in place or to use the presence of the output file path to determine in-place editing.

Now, what shall we deliver? We could build and distribute binaries for various operating systems and architectures and perhaps even deploy them (artifactory, anyone?). Again, as this is a technical product (and free/open source!), we should be fine delivering README.md with build and usage instructions, and expect our users to make their own binaries.

Building the Tool

Next, we’ll define the technical bits we will need to construct this tool. In our simple case, we need the following components:

  • Flag parsing (for user-provided options)
  • Help output (in case the tool is used incorrectly)
  • A way to determine if a text line matches a given input string
  • An output/destination file to write to
  • A way to cut matching text lines from the input
  • A way to replace the original file if edited “in place”
  • Benchmarks/tests (we’ll cover this in another article)

Let’s start with our main run loop. This will handle the user input flags and the basic logic for editing our text files.

Here, we get the parameters the user supplies for trimming a file and then perform that trimming (and we bail out if there are any errors).

To get the user-supplied parameters, we’ll define a config struct and use the flag package from the standard library, as seen below:

We define config as a convenience to ourselves and future readers — rather than passing three or more parameters between functions, we can send a reference to config, centralizing the definition and any possible documentation. It also makes future modifications easier if we decide to extend the tool (hint, hint 😉).

We then define our inputs according to our requirements above — a string for the input file path, a string for the key phrases to cut out, and a bool to indicate if the file should be edited in place. We define the name, the default value, and a usage/helper explainer string for each flag. Note that each of the return pointers to the indicated type (rather than the indicated types themselves). flags.Parse() parses the user-supplied flags from os.Args[1:] into these pointer variables. From there, we validate that the flags were appropriately supplied and return them to the caller.

The business logic to transform the input is next. Below, we clean up any pre-existing temporary output file and generate an output file that is all of cfg.inputPath without any lines containing any keys in cfg.keys, and finally, replace the original file (if the user specified in place).

Finally, we build the actual logic to parse the input file, check against any provided keys, and generate the output:

We start by opening the provided input file (and bailing if there is any issue). Note that we are using a named error return parameter — we’ll discuss that in just a bit.

Since we are dealing with files, we should remember to clean up after ourselves by deferring sourceFile.Close() (technically, we don’t need to do that for the readonly file here, but it’s still good practice. You can dive in more here).

We then create or open our output file using the same file prefix and directory as the source file, making it writable. We’ll defer closing it on success, but here we care about the error because it could potentially indicate that the output file is incomplete. As a result, we’ll set the return error value to any error generated by outFile.Close(), if there was no prior error.

Log files can be quite large. As a result, this code may perform many small writes, hurting performance. To alleviate this, we’ll use bufio (another standard library package) to provide buffered I/O. Essentially this will batch many small writes into fewer but larger writes that should incur less performance overhead (you can read more about bufio here). We will also be using a buffered reader on the source file.

After opening our source and destination files, we’ll generate a buffered writer. To avoid losing any data at the end, we’ll defer a Flush()call to force anything in the buffer to be written to the destination file, and we’ll again set our return error parameter as needed.

We then generate a buffered reader on the source file and scan through the contents line by line. If the line does not contain any of keys as a substring, we write it to the buffered writer (to be written to our destination file eventually). After all lines have been read or defers kick in. Assuming no errors, we’ll Flush() the buffer to write anything remaining to the file, close the file, and return successfully.

Using the Tool

Using the tool is of the form:

go run main.go -file="<path/to/src/file>" -keys="<keys to search for|with multiple separated by|pipes>" (and an optional -inplace to replace the source file with the output).

If you clone the project, you can test out some trim operations on example/input.txt such as the following:

Removing everything prefixed with “hello”:

go run main.go -file="example/input.txt -keys="hello"

Removing everything containing “world” (i.e., removing all lines):

go run main.go -file="example/input.txt -keys="world"

Removing everything containing the words “big” or “small”:

go run main.go -file="example/input.txt -keys="big|small"

You can look at the readme for more information and experiment with it to see what you come up with!

Further Improvements

Together we’ve built a simple yet effective tool. Still, there are obvious improvements that can be made.

It would be nice to take a regular expression (regex) as input rather than one or more keys for a substring search. You can find that in Part 2 of this series.

I plan to write a follow-up exploring some of these options. Please let me know in the comments if there is anything else you can think of. I’m also very appreciative of any naming suggestions or code clarity improvements!

I plan to write a further follow-up to discuss basic benchmarking in Go (you might see some early code for that in the GitHub repo for this project).

--

--

Stephen Wayne
Stephen Wayne

Written by Stephen Wayne

Backend cloud engineer at HashiCorp. Former Electrical Engineer turned to the dark side.

No responses yet