Unix Vs. Ruby -- Simple Tasks
12/9/2013
by Gabe Koss
I am often conflicted about whether or not it is alright to use Unix command
line utilities when I am writing Ruby code. Specifically I mean shellescaping
to run an outside command like cat #{@config_path}
.
I generally feel like, for portability reasons if nothing else, it is best to stay 'within' Ruby and utilize the language itself. Unfortunately there are many situations where reaching out to into the server environment and accessing common tools is much simpler to code.
I decided to create some tests cases to see what the impact of these two approaches might be for different, commont tasks.
Disclaimer: These tests are in no way complete, nor will they ever be. Each of these scenarios has many different solutions. I'm open to any feedback or additional tests. I also apprciate that is incredible hard to do meaningful benchmarks but my goal was to see if there were any instances where even adding the overhead of the seperate process to run the Unix tool was more efficient than handling in native Ruby.
These tests were done on a pretty beefy laptop with Ruby-2.0.0 and running Crunchbang linux.
Setup
I ran each of these tests from the same file, which had this in the header.
require 'benchmark'
The source of the test are available here
Test 1: Read a string
The goal of the first test was to read a large file into a ruby variable.
Ruby accomplished this with File.read(). The Unix variant used cat
.
Code
Benchmark.bm do |bmark|
bmark.report(:ruby) do
100.times do
result = File.read("/usr/share/dict/words")
end
end
bmark.report(:unix) do
100.times do
result = `cat /usr/share/dict/words`
end
end
end
Results
user system total real
ruby 0.090000 0.030000 0.120000 ( 0.126570)
unix 0.370000 0.150000 0.520000 ( 0.786349)
I had somehow expected that the Unix variant would be faster. I was very surprised that ruby was so much faster, but I have not investigated further.
Test 2: First element from a list
In the second test I used the same file and ran three tests:
- Ruby 1: Read the file, split on newlines, retrieve first Array element
- Ruby 2: Read the file and grab the first line which matches the regex
/\A^[a-zA-Z]*$/
. - Ruby 3: Read the file and grab the lines which matches the regex
/^[a-zA-Z]*$/
. - Unix: Use
sed 1q <
to grab the first line
Code
Benchmark.bm do |bmark|
bmark.report(:ruby1) do
100.times do
result = File.read("/usr/share/dict/words").split("\n")[0]
end
end
bmark.report(:ruby2) do
100.times do
result = File.read("/usr/share/dict/words").scan(/\A^[a-zA-Z]*$/)[0]
end
end
bmark.report(:ruby3) do
100.times do
result = File.read("/usr/share/dict/words").scan(/^[a-zA-Z]*$/)[0]
end
end
bmark.report(:unix) do
100.times do
result = `sed 1q < /usr/share/dict/words`
end
end
end
Results
user system total real
ruby1 3.090000 0.040000 3.130000 ( 3.144695)
ruby2 0.060000 0.040000 0.100000 ( 0.098774)
ruby3 10.760000 0.030000 10.790000 ( 10.815407)
unix 0.000000 0.090000 0.090000 ( 0.411978)
The second ruby one is fast, but limited in flexibility because the \A
in
the regex only grabs the first match at the top of the file. As expected
accessing the large arrays in the first and third examples is pretty sluggish!
Test 3: Pattern Matching!
So In this final test I decided to hunt for the word "sass" in my file.
For this there were several variants:
- Ruby 1: Read file, split into array and grab matches using
Array#select
and=~
. - Ruby 2: Read file and scan for
/^[a-z]*sass[a-z]*$/i
- Unix 1: Use grep to search for the pattern
- Unix 2: Hybrid variant which introduces the overhead of converting results into a Ruby Array object.
Code
Benchmark.bm do |bmark|
bmark.report(:ruby1) do
100.times do
result = File.read("/usr/share/dict/words").split("\n").select{|i| i =~ /sass/ }
end
end
bmark.report(:ruby2) do
100.times do
result = File.read("/usr/share/dict/words").scan(/^[a-z]*sass[a-z]*$/i)
end
end
bmark.report(:unix1) do
100.times do
result = `grep sass /usr/share/dict/words`
end
end
bmark.report(:unix2) do
100.times do
result = `grep sass /usr/share/dict/words`.split("\n")
end
end
end
Result
user system total real
ruby1 8.180000 0.050000 8.230000 ( 8.245998)
ruby2 15.930000 0.030000 15.960000 ( 15.995523)
unix1 0.010000 0.140000 0.940000 ( 1.225480)
unix2 0.010000 0.140000 0.930000 ( 1.208540)
As expected, working with huge strings like this is where the tiny unix utils really start to shine even with the overhead of a being run from inside Ruby.