Debugging Sessions: vendor-gem, bash, ruby, and require_relative

I have done my share of shell scripting and debugging, thanks in part to those early years doing system administration and Gentoo Linux development. I have managed to keep those skills sharp, so when my teammate @mclark ran into a shell problem, I was the first person he reached out too. This post is a dramatic retelling of our debugging session, with additional explanation.

He's been leading an effort to partition the GitHub monolith using packwerk. Matt has been working on getting some branches (1, 2) upstream, and it has been slow going for a variety of reasons. No judgement there, I know have been guilty of leaving pull requests unreviewed/unmerged and there are other reasons they are not merged, but we did want to start taking advantage of the improvements they offered.

We have this script that has been around a long time called vendor-gem. Basically, it takes a Git URL and tries to build a .gem file for it in a few steps:

clones it locally
tries to gem build it
saves it to vendor/cache
updates Gemfile with the resulting hash

It is a good fit for using mclark's fork of packwerk, and start to use it, and exactly what we do for other dependencies.

Now, I won't try to justify it's existence, or why it does what it does, except to say that it has been used for awhile, and it is occasionally useful. It could very likely be replaced any number of ways, but let us not forget the Parable of Chesteron's fence:

“In the matter of reforming things, as distinct from deforming them, there is one plain and simple principle; a principle which will probably be called a paradox. There exists in such a case a certain institution or law; let us say, for the sake of simplicity, a fence or gate erected across a road. The more modern type of reformer goes gaily up to it and says, “I don’t see the use of this; let us clear it away.” To which the more intelligent type of reformer will do well to answer: “If you don’t see the use of it, I certainly won’t let you clear it away. Go away and think. Then, when you can come back and tell me that you do see the use of it, I may allow you to destroy it.”

But, I digress. The point is that this vendor-gem could be considered technical debt that needs to be dealt with, but in the mean time we have other work to handle.

So let us begin!

The Problem

@technicalpickles you know anything about the vendor-gem script? I'm getting an error when trying to vendor my packwerk fork

$ script/vendor-gem https://github.com/mclark/packwerk -r 0f92c385c92d378fdbc2d912e4b20c9f524dbd42  
Cloning https://github.com/mclark/packwerk for gem build  
Building packwerk  
HEAD is now at 0f92c38 parallel packwerk  
commit 0f92c385c92d378fdbc2d912e4b20c9f524dbd42 (HEAD -> master, parallel-packwerk-check)  
Author: Matt Clark <44023+mclark@users.noreply.github.com>  
Date:   Fri Apr 9 11:14:09 2021 -0400    parallel packwerk  
~/github/github/tmp/gems/packwerk ~/github/github/tmp/gems/packwerk  
Building packwerk.gemspec  
Traceback (most recent call last):  
	3: from -e:1:in \`<main>'  
	2: from -e:1:in \`eval'  
	1: from (eval):3:in \`<main>'  
(eval):3:in \`require\_relative': cannot infer basepath (LoadError)

Before digging in any further, here's what I already know:

it is a shell script from prior experience
it looks like a ruby stack trace at the bottom
the error is cannot infer basepath (LoadError)

Narrow it down

Opening up the file, you can see a wall of bash code. We could spend some time reading the source to understand it, but I tend to favor finding ways to identify where it is failing.

For bash, a good way to do that is to use set -x. From the [Bash Guide for Beginners][debugging bash scripts]:

When things don't go according to plan, you need to determine what exactly causes the script to fail. Bash provides extensive debugging features. The most common is to start up the subshell with the -x option, which will run the entire script in debug mode. Traces of each command plus its arguments are printed to standard output after the commands have been expanded but before they are executed.

set -x			# activate debugging from here
w
set +x			# stop debugging from here

The top of the file already has a set -eu, so that's a good spot to add x for now. Alternatively, we can call it with bash -x as well to avoid modifying the file.

This gives us more info about what is happening when it fails:

++ ruby -e 'require '\''rubygems'\''; spec=eval(File.read('\''packwerk.gemspec'\'')); print spec.name.to_s'
Traceback (most recent call last):
        3: from -e:1:in `<main>'
        2: from -e:1:in `eval'
        1: from (eval):3:in `<main>'
-e:1:in `eval': cannot infer basepath (LoadError)
+ gemname=
+ rm -rf tmp/gems/packwerk

Making sense of it

The ruby -e defines a script on the command line without it having make a file.

It looks like we are trying to read the file, and then eval it in order to pull the gem out of it. .gemspec basically are creating an instance of [Gem::Specification], and looks like:

Gem::Specification.new do |s|
  s.name        = 'example'
  s.version     = '0.1.0'
  s.licenses    = \['MIT'\]
  s.summary     = "This is an example!"
  s.description = "Much longer explanation of the example!"
  s.authors     = \["Ruby Coder"\]
  s.email       = 'rubycoder@example.com'
  s.files       = \["lib/example.rb"\]
  s.homepage    = 'https://rubygems.org/gems/example'
  s.metadata    = { "source\_code\_uri" \=> "https://github.com/example/example" }
end

It is literally Ruby, and so it can be read and then eval'd. Because we are evaling it, there's an implict return value of the Gem::Specification that was just built. That is assigned to spec, and then returning spec.name.

That is all to try to get the gem's actual name in the case it doesn't match the repository name. It usually does match, but not always, and repositories can contain multiple gems.

Narrow it down pt. 2

We know which command failed, but can we reproduce it? If you look at the bottom, the directory it was running is in was deleted:

+ rm -rf tmp/gems/packwerk

What could be doing that? Searching the file for rm -rf is a good way to find it:

# clone the repo under tmp, clean up on exit
echo "Cloning $url for gem build"
mkdir -p "tmp/gems/$gem"
trap "rm -rf tmp/gems/$gem" EXIT

Well, it's cleaning up at exit which explains why it's not there. It's using trap to catch signals like QUIT, INT (control-c), etc. EXIT is a special signal that bash sends when it's exiting.

This is good code hygiene, but it can make debugging harder in cases like right now. Commenting out is the easiest thing while we debug:

# clone the repo under tmp, clean up on exit
echo "Cloning $url for gem build"
mkdir -p "tmp/gems/$gem"
# trap "rm -rf tmp/gems/$gem" EXIT

If we run the script again, we get the failure again, but now tmp/gems/packwerk is there.

We need to know where to run the command. tmp/gems/packwerk is a good guess since it was what was cleaned up. We can also look for cd or pushd in the code:

# go in and build the gem using the HEAD version, clean up this tmp dir on exit
echo "Building $gem"
(
    cd "tmp/gems/$gem"

Now we can cd tmp/gems/packwerk and run the command to reproduce it:

$ ruby -e 'require '\''rubygems'\''; spec=eval(File.read('\''packwerk.gemspec'\'')); print spec.name.to_s'
Traceback (most recent call last):
        3: from -e:1:in `<main>'
        2: from -e:1:in `eval'
        1: from (eval):3:in `<main>'
-e:1:in `eval': cannot infer basepath (LoadError)

Making sense of it pt. 2

With just the command and the error, we have a lot less noise to get distracted by. The output is a Ruby error. Normally, it shows one line per file with line information, and the method it is in. We don't actually have much of that. What's more, there's this ruby -e. If you aren't familiar with it, from man ruby:

     -e command     Specifies script from command-line
                    while telling Ruby not to search the
                    rest of the arguments for a script
                    file name.

That means you can put the ruby you want to run as an argument, rather than taking a filename. This is convenient for one-line commands like we are doing right now.

Knowing that, we see -e is showing up in the stacktrace. I'm taking that to mean, that is what is being interpreted as the file:

$ ruby 💥-e💥 'require '\''rubygems'\''; spec=eval(File.read('\''packwerk.gemspec'\'')); print spec.name.to_s'
Traceback (most recent call last):
        3: from 💥-e💥:1:in `<main>'
        2: from 💥-e💥:1:in `eval'
        1: from (eval):3:in `<main>'
💥-e💥:1:in `eval': cannot infer basepath (LoadError)

Since we are reading a file, and evaling it, it'd be a good time to look at what is in the file:

# frozen_string_literal: true

require_relative "lib/packwerk/version"

Gem::Specification.new do |spec|
  spec.name          = "packwerk"
  spec.version       = Packwerk::VERSION
  spec.authors       = ["Shopify Inc."]
  spec.email         = ["gems@shopify.com"]

  spec.summary       = "Packages for applications based on the zeitwerk autoloader"

  spec.description   = <<~DESCRIPTION
    Sets package level boundaries between a specified set of ruby
    constants to minimize cross-boundary referencing and dependency.
  DESCRIPTION

It's using require_relative to load a file:

module Packwerk
  VERSION = "1.1.3"
end

This is a pretty ubiquitous pattern. Things like bundler new (and jeweler too if you remember that far back even generate it for you.

Googling

It is a good point to Google the error. Let's just focus on the last part of line of it, strip out -e and other punctuation, and add ruby: ruby eval cannot infer basepath (LoadError).

This is the first result that comes up: require - ruby require_relative gives LoadError: cannot infer basepath inside IRB - Stack Overflow:

require_relative requires a file relative to the file the call to require_relative is in. Your call to require_relative isn't in any file, it's in the interactive interpreter, therefore it doesn't work.

This question is about using irb, but a lot of it applies to ruby -e too.

It's a good time to look for require_relative [Ruby's documentation on require_relative]:

Ruby tries to load the library named string relative to the requiring file’s path. If the file’s path cannot be determined a LoadError is raised. If a file is loaded true is returned and false otherwise.

Based on that post, it seems that changing require_relative 'lib/packwerk/version' to require './lib/packwerk/version' would be enough to work. The problem is it's part of gem's source, so we'd have to replace it as part of our script. This is a little bit of outside the box thinking, but it does feel a little hacky. Let's keep it in our back pocket, but keep working through this.

A red herring, and back on track

While it's useful information about require_relative, we forgot an important part of our error and original search: eval. eval can take a string, and execute it as Ruby. What happens if there are errors while evaling? Exactly like what we saw:

        1: from (eval):3:in `<main>'
💥-e💥:1:in `eval': cannot infer basepath (LoadError)

It doesn't matter if running with ruby -e means there isn't information the current file there. That isn't the context that the error is happening, the context is eval.

If you have done much meta-programming with Ruby and eval, you may recall it has some extra arguments. From the [ruby eval][docs]:

eval(string [, binding [, filename [,lineno]]]) → objclick to togg
Evaluates the Ruby expression(s) in string. If binding is given, which must be a Binding object, the evaluation is performed in its context. If the optional filename and lineno parameters are present, they will be used when reporting syntax errors.

The solution

I haven't seen it spelt out anywhere, but I have this hunch that adding a filename as an argument to eval will fix it. The file being eval'd is the gemspec, so that should be the filename.

These are positional arguments, so we need to specify the second argument (binding) too. gemspecs are pretty standalone and won't have any other context they need to bind to, so nil seems valid.

Let's try this:

$ ruby -e ''require '\''rubygems'\''; spec=eval(File.read('\''packwerk.gemspec'\''), nil, '\''packwerk.gemspec'\''); print spec.name.to_s'

packwerk⏎

Success! Take what we learn back to the script, and leave the set -x just in case...

    gemname=$(ruby -e "require 'rubygems'; spec=eval(File.read('$gemspec'), nil, '$gemspec'); print spec.name.to_s")

And it works! As we review the code, we see a few other spots on other conditional branches using the same logic, so we fix those too. In fact, there's another eval that is correctly giving a filename:

  spec = eval(File.read("$gemspec"), nil, "$gemspec")
  spec.version = "$GEM_VERSION"
  spec

Cleanup

While we are making the change, I thought it'd be useful to add a debugging option. That way, it's a little easier for the next person debugging. Here's what we ended up with:

: "${debug:=}"
while [ $# -gt 0 ]; do
        case "$1" in
                -d|--debug)
                        debug=1
                        shift
                        ;;
		            # other options here
        esac
done

# snip

if [ -n "$debug" ]; then
    set -x
fi

if [ -z "$debug" ]; then
    trap "rm -rf tmp/gems/$gem" EXIT
fi

There is some more bashisms and shellisms here that are probably worth further explanation, but this post is long enough already :blush:

In summary, if there's a -d or --debug option:

enable tracing with set -x
don't cleanup the gem at the end

Conclusion

With this problem solved, Matt was able to go on to start using his packwerk changes in production. In the end, this ended up being less than an hour of pairing. I spent more time writing this in fact!

The tools, errors, and people will change, but these kinds of problems come up constantly. I've found it valuable to use them as opportunities to learn a bit more in the areas they touch upon. This process, in turn, is it's own skill that you get really good at with enough time and bugs!

Here are some debugging techniques and philsophy you can take away from this:

try to reproduce the problem
narrow down where it is happen
be curious; really read what the messages are telling you
sometimes you end up chasing down a false lead, but it's okay as long as you learn something from it
get creative in how you google for things
don't forget to use known documentation like man pages and project documentation
leave things a little better than you left it; make it easier for the next person

And some tidbits we learned about bash and ruby:

set -x generates a lot of output, but is incredibly informative for debugging
trap can be used for cleanup when a bash script exits, in addition to normal signals
ruby -e lets you write one-liners without having to have a file
require_relative won't work in irb, ruby -e or eval (without a filename)
giving eval file and line information will almost always make debugging easier