I have done my share of shell scripting and debugging, thanks in part to those early years doing system administration and Gentoo Linux development. I have managed to keep those skills sharp, so when my teammate @mclark ran into a shell problem, I was the first person he reached out too. This post is a dramatic retelling of our debugging session, with additional explanation.
He's been leading an effort to partition the GitHub monolith using packwerk. Matt has been working on getting some branches (1, 2) upstream, and it has been slow going for a variety of reasons. No judgement there, I know have been guilty of leaving pull requests unreviewed/unmerged and there are other reasons they are not merged, but we did want to start taking advantage of the improvements they offered.
We have this script that has been around a long time called vendor-gem. Basically, it takes a Git URL and tries to build a .gem file for it in a few steps:
- clones it locally
- tries to
gem build
it - saves it to
vendor/cache
- updates
Gemfile
with the resulting hash
It is a good fit for using mclark's fork of packwerk, and start to use it, and exactly what we do for other dependencies.
Now, I won't try to justify it's existence, or why it does what it does, except to say that it has been used for awhile, and it is occasionally useful. It could very likely be replaced any number of ways, but let us not forget the Parable of Chesteron's fence:
“In the matter of reforming things, as distinct from deforming them, there is one plain and simple principle; a principle which will probably be called a paradox. There exists in such a case a certain institution or law; let us say, for the sake of simplicity, a fence or gate erected across a road. The more modern type of reformer goes gaily up to it and says, “I don’t see the use of this; let us clear it away.” To which the more intelligent type of reformer will do well to answer: “If you don’t see the use of it, I certainly won’t let you clear it away. Go away and think. Then, when you can come back and tell me that you do see the use of it, I may allow you to destroy it.”
But, I digress. The point is that this vendor-gem could be considered technical debt that needs to be dealt with, but in the mean time we have other work to handle.
So let us begin!
The Problem
@technicalpickles you know anything about the
vendor-gem
script? I'm getting an error when trying to vendor my packwerk fork$ script/vendor-gem https://github.com/mclark/packwerk -r 0f92c385c92d378fdbc2d912e4b20c9f524dbd42 Cloning https://github.com/mclark/packwerk for gem build Building packwerk HEAD is now at 0f92c38 parallel packwerk commit 0f92c385c92d378fdbc2d912e4b20c9f524dbd42 (HEAD -> master, parallel-packwerk-check) Author: Matt Clark <44023+mclark@users.noreply.github.com> Date: Fri Apr 9 11:14:09 2021 -0400 parallel packwerk ~/github/github/tmp/gems/packwerk ~/github/github/tmp/gems/packwerk Building packwerk.gemspec Traceback (most recent call last): 3: from -e:1:in \`<main>' 2: from -e:1:in \`eval' 1: from (eval):3:in \`<main>' (eval):3:in \`require\_relative': cannot infer basepath (LoadError)
Before digging in any further, here's what I already know:
- it is a shell script from prior experience
- it looks like a ruby stack trace at the bottom
- the error is
cannot infer basepath (LoadError)
Narrow it down
Opening up the file, you can see a wall of bash code. We could spend some time reading the source to understand it, but I tend to favor finding ways to identify where it is failing.
For bash, a good way to do that is to use set -x
. From the [Bash Guide for Beginners][debugging bash scripts]:
When things don't go according to plan, you need to determine what exactly causes the script to fail. Bash provides extensive debugging features. The most common is to start up the subshell with the -x option, which will run the entire script in debug mode. Traces of each command plus its arguments are printed to standard output after the commands have been expanded but before they are executed.
set -x # activate debugging from here w set +x # stop debugging from here
The top of the file already has a set -eu
, so that's a good spot to add x
for now. Alternatively, we can call it with bash -x
as well to avoid modifying the file.
This gives us more info about what is happening when it fails:
++ ruby -e 'require '\''rubygems'\''; spec=eval(File.read('\''packwerk.gemspec'\'')); print spec.name.to_s'
Traceback (most recent call last):
3: from -e:1:in `<main>'
2: from -e:1:in `eval'
1: from (eval):3:in `<main>'
-e:1:in `eval': cannot infer basepath (LoadError)
+ gemname=
+ rm -rf tmp/gems/packwerk
Making sense of it
The ruby -e
defines a script on the command line without it having make a file.
It looks like we are trying to read the file, and then eval
it in order to pull the gem out of it. .gemspec
basically are creating an instance of [Gem::Specification]
, and looks like:
Gem::Specification.new do |s|
s.name = 'example'
s.version = '0.1.0'
s.licenses = \['MIT'\]
s.summary = "This is an example!"
s.description = "Much longer explanation of the example!"
s.authors = \["Ruby Coder"\]
s.email = 'rubycoder@example.com'
s.files = \["lib/example.rb"\]
s.homepage = 'https://rubygems.org/gems/example'
s.metadata = { "source\_code\_uri" \=> "https://github.com/example/example" }
end
It is literally Ruby, and so it can be read and then eval'd. Because we are evaling it, there's an implict return value of the Gem::Specification
that was just built. That is assigned to spec
, and then returning spec.name
.
That is all to try to get the gem's actual name in the case it doesn't match the repository name. It usually does match, but not always, and repositories can contain multiple gems.
Narrow it down pt. 2
We know which command failed, but can we reproduce it? If you look at the bottom, the directory it was running is in was deleted:
+ rm -rf tmp/gems/packwerk
What could be doing that? Searching the file for rm -rf
is a good way to find it:
# clone the repo under tmp, clean up on exit
echo "Cloning $url for gem build"
mkdir -p "tmp/gems/$gem"
trap "rm -rf tmp/gems/$gem" EXIT
Well, it's cleaning up at exit which explains why it's not there. It's using trap
to catch signals like QUIT
, INT
(control-c), etc. EXIT
is a special signal that bash sends when it's exiting.
This is good code hygiene, but it can make debugging harder in cases like right now. Commenting out is the easiest thing while we debug:
# clone the repo under tmp, clean up on exit
echo "Cloning $url for gem build"
mkdir -p "tmp/gems/$gem"
# trap "rm -rf tmp/gems/$gem" EXIT
If we run the script again, we get the failure again, but now tmp/gems/packwerk
is there.
We need to know where to run the command. tmp/gems/packwerk
is a good guess since it was what was cleaned up. We can also look for cd
or pushd
in the code:
# go in and build the gem using the HEAD version, clean up this tmp dir on exit
echo "Building $gem"
(
cd "tmp/gems/$gem"
Now we can cd tmp/gems/packwerk
and run the command to reproduce it:
$ ruby -e 'require '\''rubygems'\''; spec=eval(File.read('\''packwerk.gemspec'\'')); print spec.name.to_s'
Traceback (most recent call last):
3: from -e:1:in `<main>'
2: from -e:1:in `eval'
1: from (eval):3:in `<main>'
-e:1:in `eval': cannot infer basepath (LoadError)
Making sense of it pt. 2
With just the command and the error, we have a lot less noise to get distracted by. The output is a Ruby error. Normally, it shows one line per file with line information, and the method it is in. We don't actually have much of that. What's more, there's this ruby -e
. If you aren't familiar with it, from man ruby
:
-e command Specifies script from command-line
while telling Ruby not to search the
rest of the arguments for a script
file name.
That means you can put the ruby you want to run as an argument, rather than taking a filename. This is convenient for one-line commands like we are doing right now.
Knowing that, we see -e
is showing up in the stacktrace. I'm taking that to mean, that is what is being interpreted as the file:
$ ruby 💥-e💥 'require '\''rubygems'\''; spec=eval(File.read('\''packwerk.gemspec'\'')); print spec.name.to_s'
Traceback (most recent call last):
3: from 💥-e💥:1:in `<main>'
2: from 💥-e💥:1:in `eval'
1: from (eval):3:in `<main>'
💥-e💥:1:in `eval': cannot infer basepath (LoadError)
Since we are reading a file, and evaling it, it'd be a good time to look at what is in the file:
# frozen_string_literal: true
require_relative "lib/packwerk/version"
Gem::Specification.new do |spec|
spec.name = "packwerk"
spec.version = Packwerk::VERSION
spec.authors = ["Shopify Inc."]
spec.email = ["gems@shopify.com"]
spec.summary = "Packages for applications based on the zeitwerk autoloader"
spec.description = <<~DESCRIPTION
Sets package level boundaries between a specified set of ruby
constants to minimize cross-boundary referencing and dependency.
DESCRIPTION
It's using require_relative
to load a file:
module Packwerk
VERSION = "1.1.3"
end
This is a pretty ubiquitous pattern. Things like bundler new
(and jeweler too if you remember that far back even generate it for you.
Googling
It is a good point to Google the error. Let's just focus on the last part of line of it, strip out -e
and other punctuation, and add ruby
: ruby eval cannot infer basepath (LoadError)
.
This is the first result that comes up: require - ruby require_relative gives LoadError: cannot infer basepath inside IRB - Stack Overflow:
require_relative
requires a file relative to the file the call torequire_relative
is in. Your call to require_relative isn't in any file, it's in the interactive interpreter, therefore it doesn't work.
This question is about using irb
, but a lot of it applies to ruby -e
too.
It's a good time to look for require_relative [Ruby's documentation on require_relative
]:
Ruby tries to load the library named string relative to the requiring file’s path. If the file’s path cannot be determined a LoadError is raised. If a file is loaded true is returned and false otherwise.
Based on that post, it seems that changing require_relative 'lib/packwerk/version'
to require './lib/packwerk/version'
would be enough to work. The problem is it's part of gem's source, so we'd have to replace it as part of our script. This is a little bit of outside the box thinking, but it does feel a little hacky. Let's keep it in our back pocket, but keep working through this.
A red herring, and back on track
While it's useful information about require_relative
, we forgot an important part of our error and original search: eval
. eval
can take a string, and execute it as Ruby. What happens if there are errors while evaling? Exactly like what we saw:
1: from (eval):3:in `<main>'
💥-e💥:1:in `eval': cannot infer basepath (LoadError)
It doesn't matter if running with ruby -e
means there isn't information the current file there. That isn't the context that the error is happening, the context is eval
.
If you have done much meta-programming with Ruby and eval
, you may recall it has some extra arguments. From the [ruby eval][docs]:
eval(string [, binding [, filename [,lineno]]]) → objclick to togg
Evaluates the Ruby expression(s) in string. If binding is given, which must be a Binding object, the evaluation is performed in its context. If the optional filename and lineno parameters are present, they will be used when reporting syntax errors.
The solution
I haven't seen it spelt out anywhere, but I have this hunch that adding a filename as an argument to eval
will fix it. The file being eval'd is the gemspec, so that should be the filename
.
These are positional arguments, so we need to specify the second argument (binding
) too. gemspecs are pretty standalone and won't have any other context they need to bind to, so nil
seems valid.
Let's try this:
$ ruby -e ''require '\''rubygems'\''; spec=eval(File.read('\''packwerk.gemspec'\''), nil, '\''packwerk.gemspec'\''); print spec.name.to_s'
packwerk⏎
Success! Take what we learn back to the script, and leave the set -x
just in case...
gemname=$(ruby -e "require 'rubygems'; spec=eval(File.read('$gemspec'), nil, '$gemspec'); print spec.name.to_s")
And it works! As we review the code, we see a few other spots on other conditional branches using the same logic, so we fix those too. In fact, there's another eval
that is correctly giving a filename:
spec = eval(File.read("$gemspec"), nil, "$gemspec")
spec.version = "$GEM_VERSION"
spec
Cleanup
While we are making the change, I thought it'd be useful to add a debugging option. That way, it's a little easier for the next person debugging. Here's what we ended up with:
: "${debug:=}"
while [ $# -gt 0 ]; do
case "$1" in
-d|--debug)
debug=1
shift
;;
# other options here
esac
done
# snip
if [ -n "$debug" ]; then
set -x
fi
if [ -z "$debug" ]; then
trap "rm -rf tmp/gems/$gem" EXIT
fi
There is some more bashisms and shellisms here that are probably worth further explanation, but this post is long enough already :blush:
In summary, if there's a -d
or --debug
option:
- enable tracing with
set -x
- don't cleanup the gem at the end
Conclusion
With this problem solved, Matt was able to go on to start using his packwerk changes in production. In the end, this ended up being less than an hour of pairing. I spent more time writing this in fact!
The tools, errors, and people will change, but these kinds of problems come up constantly. I've found it valuable to use them as opportunities to learn a bit more in the areas they touch upon. This process, in turn, is it's own skill that you get really good at with enough time and bugs!
Here are some debugging techniques and philsophy you can take away from this:
- try to reproduce the problem
- narrow down where it is happen
- be curious; really read what the messages are telling you
- sometimes you end up chasing down a false lead, but it's okay as long as you learn something from it
- get creative in how you google for things
- don't forget to use known documentation like man pages and project documentation
- leave things a little better than you left it; make it easier for the next person
And some tidbits we learned about bash and ruby:
set -x
generates a lot of output, but is incredibly informative for debuggingtrap
can be used for cleanup when a bash script exits, in addition to normal signalsruby -e
lets you write one-liners without having to have a filerequire_relative
won't work inirb
,ruby -e
oreval
(without a filename)- giving
eval
file and line information will almost always make debugging easier