Unix make vs Apache Airflow
In an IEEE Software “Adventures in Code” column titled
Modular Data Analytics
I describe the benefits and use of
simple-rolap,
a tool suite for relational online analytical processing.
I have built simple-rolap based on the Unix make tool and a few
shell scripts.
With make approaching its 50th birthday,
before writing the column
I looked for possible modern and better alternatives I might be ignoring.
Continue reading "Unix make vs Apache Airflow"Last modified: Tuesday, October 15, 2024 2:19 pm
Extending the life of TomTom wearables
TomTom recently announced
it would stop operating their supporting infrastructure by the end of September
following its earlier decision
to exit the wearables market.
This means that its products, such as sports watches, will become effectively
useless, as they will no longer be able to export their activities and
sync them with tracker sites.
Throwing away an otherwise fine watch only because its maker decided to
shut down its proprietary infrastructure seems like a sad waste.
Here is how you can download the watch’s data and
upload it to Strava, a popular activity tracker,
using open source software.
Continue reading "Extending the life of TomTom wearables"Last modified: Friday, November 3, 2023 5:57 pm
Rather than alchemy, methodical troubleshooting
I recently encountered a pesky problem while trying to
build a React Native project under Apple’s Xcode.
The build would fail with an error reporting:
EMFILE: too many open files, watch
.
Frustratingly, all available advice on the web pointed to
different (often inexplicable) directions, none of which worked.
After tormenting myself with these, I decided to troubleshoot
the problem methodically, which allowed me to pinpoint it and
solve it with an uncommon and noteworthy application of the
git bisect command.
Here’s the story.
Continue reading "Rather than alchemy, methodical troubleshooting"Last modified: Saturday, November 27, 2021 8:21 pm
The Evolution of the Unix System Architecture
Unix has evolved for more than five decades,
shaping modern operating systems,
key software technologies, and development practices.
Studying the evolution of this remarkable system from an
architectural perspective can provide insights
on how to manage the growth of large, complex, and long-lived software systems.
In 2016 my colleague Paris Avgeriou
and I embarked on this study aiming to combine
his software architecture insights with my software analytics skills.
Here is a brief summary of the study, which was published this month
in the IEEE Transactions on Software Engineering.
Continue reading "The Evolution of the Unix System Architecture"Last modified: Friday, June 18, 2021 1:39 pm
Reviving the 1973 Unix text to voice translator
The early Research Edition Unix versions featured a program that would turn
a stream of ASCII text into utterances that could be played by a voice
synthesizer.
The source code of this program was lost for years.
Here’s the story of how I brought it back to life.
Continue reading "Reviving the 1973 Unix text to voice translator"Last modified: Saturday, January 2, 2021 4:49 pm
Error handling under Unix and Windows
One thing that struck me when I first encountered the 4.3BSD Unix
system call documentation in the 1980s, was that each call was followed
by an exhaustive list of the errors associated with it.
Ten years later, when I was going through the Windows API, I was
disappointed to see that very few functions documented their error
conditions.
This is a big deal.
Continue reading "Error handling under Unix and Windows"Last modified: Wednesday, September 30, 2020 0:00 am
Shell scripting for software developers
In an open online edX course on Unix tools I was running over the spring with more than a thousand registered learners,
I got
asked for ideas on how shell scripts can be useful.
This is an intriguing question, because the course focuses mainly
on performing one-off tasks in the areas of software development,
data engineering, and system administration, rather than automation
through shell scripts.
In response, I posted
how shell scripting improves my personal productivity.
Here’s my take on how shell scripts are employed in diverse software
development tasks.
I plan to post further installments on system administration and data analytics.
Continue reading "Shell scripting for software developers"Last modified: Thursday, August 27, 2020 7:26 pm
Shell scripting for personal productivity
In an edX course on Unix tools I am running these weeks,
I got
asked for ideas on how shell scripts can be useful.
This is a very interesting question, because the course focuses mainly
on performing one-off tasks in the areas of software development,
data engineering, and system administration, rather than automation
through shell scripts.
Here’s how I’m using shell scripting to enhance my personal productivity.
I’ll post further installments regarding software development
and system administration.
Continue reading "Shell scripting for personal productivity"Last modified: Monday, March 23, 2020 11:43 am
On Tuesday March 17th 2020 my free online massive open online course (MOOC)
on the use of Unix command line tools
for data, software, and production engineering
goes live on the edX platform.
Already more than one thousand participants from around the world
have registered for it;
you should still be able to enroll through
this link.
In response to the course’s announcement
seasoned researchers from around the world have commented that this is an
indispensable course
and that it is
very hard to beat the ROI of acquiring this skillset, both for academia and industry.
In an age of shiny IDEs and cool GUI tools, what are the reasons for
the enduring utility and popularity of the Unix command line tools?
Here’s my take.
Continue reading "Seven reasons to add Unix command line expertise to your tool chest"Last modified: Monday, March 16, 2020 0:34 am
Was Knuth Really Framed by Jon Bentley?
Recently, the formal methods specialist
Hillel Wayne posted an interesting
article
discussing whether Donald Knuth was actually framed
when Jon Bentley asked him to demonstrate literate programming.
(Knuth came up with an 8-page long monolithic listing,
whereas in a critique Doug McIlroy provided a six line shell script.)
The article makes many interesting and valid points.
However, among the raised points one
is that the specified problem was ideal for solving with
Unix tools, and that a different problem, such as
“find the top K pairs of words and print the Levenshtein
distance between each pair”,
would be much more difficult to solve with Unix commands.
As the developer of an
edX massive open online course (MOOC) on the use of Unix Tools for data, software and production engineering
I decided to put this claim to the test.
Continue reading "Was Knuth Really Framed by Jon Bentley?"Last modified: Tuesday, February 25, 2020 9:53 pm
Convert file I/O into pipe I/O with /dev/fd
Some Unix commands read data from files or write data to files,
without offering an obvious way to use them as part of a pipeline.
How can you write a program to interact with such a command
in a streaming fashion?
This would allow your program and the command run concurrently,
without the storage and I/O overhead of a temporary file.
You could create and use a named pipe, but this is a clunky solution,
requiring you to create and destroy a unique underlying file name.
Here’s a better approach.
Continue reading "Convert file I/O into pipe I/O with /dev/fd"Last modified: Saturday, December 14, 2019 3:19 pm
How to monitor MySQL / MariaDB query progress
The progress indicator of MySQL or MariaDB long-running commands and queries is
extremely extremely and frustratingly coarse.
In an index update I’m running now it was stuck in the same state for
more than three hours.
Thankfully, the pmonitor tool allows us to
precisely monitor the progress of many commands.
Here’s an example of its application on MariaDB.
Continue reading "How to monitor MySQL / MariaDB query progress"Last modified: Sunday, November 3, 2019 2:13 pm
Java Stream Methods and Unix Pipeline Commands: A Dictionary
While preparing my class notes for functional programming in Java
I was struck between the neat correspondence between many Java Stream
methods and Unix commands.
I decided to organize the most common of these in a dictionary form
that allows the mapping between the two.
I’d very much welcome comments regarding common patterns that I’ve missed.
Continue reading "Java Stream Methods and Unix Pipeline Commands: A Dictionary"Last modified: Thursday, December 6, 2018 9:42 pm
How I slashed a SQL query runtime from 380 hours to 12 with two Unix commands
I was trying to run a simple join query
on MariaDB (MySQL) and its performance was horrendous.
Here’s how I cut down the query’s run time from over
380 hours to under 12 hours by executing part of it
with two simple Unix commands.
Continue reading "How I slashed a SQL query runtime from 380 hours to 12 with two Unix commands"Last modified: Sunday, August 5, 2018 8:20 pm
The Unix sort command can efficiently handle files of arbitrary size
(think of terabytes).
It does this
by loading into main memory all the data that can fit into it (say 16GB),
sorting that data efficiently using an O(N log N) algorithm,
and then merge-sorting the chunks with a linear complexity O(N) cost.
If the number of sorted chunks is higher than the number of file descriptors
that the merge operation can simultaneously keep open
(typically more than 1000),
then sort will recursively merge-sort intermediate merged files.
Once you have at hand sorted files with unique elements,
you can efficiently perform set operations with them through linear
complexity O(N) operations.
Here is how to do it.
Continue reading "How to Perform Set Operations on Terabyte Files"Last modified: Tuesday, April 3, 2018 8:44 pm
Reviving the 1973 Unix Programmer’s Manual
The 1973 Fourth Edition of the Unix Programmer’s Manual doesn’t
seem to be available online in typeset form.
This is how I managed to recreate it from its source code.
Continue reading "Reviving the 1973 Unix Programmer’s Manual"Last modified: Sunday, November 19, 2017 2:36 pm
How I Recovered my Firefox Tab Groups
When quit and restarted Firefox today
I received an unwelcomed shock.
All my tab groups, which I maintained using the
Tab Groups by Quicksaver
plugin, were gone!
This happened because it upgraded to Firefox Quantum (57),
whose API does not maintain backward compatibility with the one used by the
plugin.
Although I knew the plugin would one day stop working,
I thought there would be some last-minute warning and chance to export
the tab groups.
Continue reading "How I Recovered my Firefox Tab Groups"Last modified: Saturday, November 18, 2017 10:04 am
The Origins of Malloc
The 1973 Fourth Edition Unix kernel source code contains two routines,
malloc and
mfree,
that manage the dynamic allocation and release
of main memory blocks for in-memory processes and
of continuous disk swap area blocks for swapped-out processes.
Their implementation and history can teach us many things regarding
modern computing.
Continue reading "The Origins of Malloc"Last modified: Thursday, September 14, 2017 11:47 am
Display Git’s and Current Directory on Terminal Bar
I typically have more than ten windows open on my desktop and rely
on their names to select them.
Being a command-line aficionado, most of them are terminals.
I have them configured to display the current directory by
setting the bash PROMPT_COMMAND
environment variable to
'printf "\033]0;%s:%s\007" "${HOSTNAME%%.*}" "${PWD/#$HOME/~}"'
.
The problem is that the directory I’m often in has a generic name,
such as src
or doc
, so the terminal’s name isn’t very useful.
Continue reading "Display Git’s and Current Directory on Terminal Bar"Last modified: Thursday, August 10, 2017 7:51 pm
Unix Architecture Evolution Diagrams
Today I put online two
diagrams depicting the architecture of the Unix operating system,
one for the 1972 First Research Edition and one for FreeBSD,
one of its direct descendants.
Here are the details on how I created these diagrams.
Continue reading "Unix Architecture Evolution Diagrams"Last modified: Wednesday, May 10, 2017 4:20 pm
The 1980s Research Unix Editions Are Now Available for Study
In 2002 Caldera International
licensed
the source code distribution of several historic Unix editions.
This included all Research Unix editions up to the Seventh Edition,
but excluded the 1980s 8th, 9th, and 10th Edition.
This was unfortunate, because these editions pioneered or implemented several
features that were very advanced at the time, such as
streams inter-process
communication,
graphics terminals and the associated Sam text editor,
network filesystems, and
graphics typesetting tools.
Continue reading "The 1980s Research Unix Editions Are Now Available for Study"Last modified: Tuesday, March 28, 2017 5:05 pm
How to avoid redoing manual corrections
Say you have an automated process to create a report, which you then have to
polish by hand, because there are adjustments that require human judgment.
After three hours of polishing, you realize that the report is full of errors
due to a bug in the initial reporting process.
Is there a way to salvage the three hours of work you put into it?
Continue reading "How to avoid redoing manual corrections"Last modified: Monday, January 16, 2017 2:10 pm
Verifying the Substitution Cipher Folklore
A substitution cipher has each letter substituted with another.
Cryptography folklore has it that simple substitution ciphers
are trivial
to break by looking at the letter frequencies of the encrypted text.
I tested the folklore and the results were not quite what I was expecting.
Continue reading "Verifying the Substitution Cipher Folklore"Last modified: Friday, March 18, 2016 10:30 am
The Birth of Standard Error
Earlier today Stephen Johnson, in a mailing list run by the
The Unix Heritage Society,
described the birth of the standard error concept:
the idea that a program's error output is sent on a channel
different from that of its normal output.
Over the past forty years, all major operating systems and language libraries
have embraced this concept.
Continue reading "The Birth of Standard Error"Last modified: Monday, August 5, 2024 2:09 pm
How to Calculate an Operation's Memory Consumption
How can you determine how much memory is consumed by a specific
operation of a Unix program?
Valgrind's Massif subsystem could help you in this regard,
but it can be difficult to isolate a specific operation from
Massif's output.
Here is another, simpler way.
Continue reading "How to Calculate an Operation's Memory Consumption"Last modified: Saturday, September 22, 2012 5:46 pm
Pretend Invitations
Choosing between people you want to invite to a function and people you
have to invite is sometimes difficult.
Say Alice wants to invite Tom, Dick, and Harry to a party, but she'd actually
prefer if Dick didn't show up.
Here's how Alice can send invitations by email from an email-capable
Unix system to achieve the desired result,
while covering her scheming with plausible deniability.
Continue reading "Pretend Invitations"Last modified: Wednesday, December 28, 2011 12:29 am
Apps are the New Users
Some facilities provided by mature multi-user operating systems appear arcane today. Administrators of computers running Mac OS X or Linux can see users logged-in from remote terminals, they can specify limits on the disk space one can use, and they can run accounting statistics to see how much CPU time or disk I/O a user has consumed over a month. These operating systems also offer facilities to group users together, to specify various protection levels for each user's files, and to prescribe which commands a user can run.
Continue reading "Apps are the New Users"Last modified: Wednesday, December 14, 2011 5:24 pm
Code Verification Scripts
Which of my classes contain instance variables?
Which classes call the method userGet
,
but don't call the method userRegister
?
These and similar questions often come up when you want to verify
that your code is free from some errors.
For example, instance variable can be a problem in servlet classes.
Or you may have found a bug related to the
userGet
and userRegister
methods,
and you want to look for other places where this occurs.
Your IDE is unlikely to answer such questions,
and this is where a few lines in the Unix shell can save
you hours of frustration.
Continue reading "Code Verification Scripts"Last modified: Saturday, May 21, 2011 9:40 pm
Batch Files as Shell Scripts Revisited
Four years ago I wrote
about a method that could be used to have the Unix Bourne shell interpret
Windows batch files.
I'm using this trick a lot, because programming using the Windows/DOS
batch files facilities is decidedly painful, whereas the Bourne
shell remains a classy programming environment.
There are still many cases where the style of Unix shell programming
outshines and outperforms even modern scripting languages.
Continue reading "Batch Files as Shell Scripts Revisited"Last modified: Wednesday, August 4, 2010 11:21 pm
Useful Polyglot Code
Four years ago I blogged about an
incantation that would allow the Windows command interpreter (cmd) to execute
Unix shell scripts written inside plain batch files.
Time for an update.
Continue reading "Useful Polyglot Code"Last modified: Tuesday, January 12, 2010 6:52 pm
Tags for Bibliography References
I love writing my papers in LaTeX.
Its declarative style allows me to concentrate on the content,
rather than the form.
I even format the text according to the content,
keeping each phrase or logical unit on a separate line.
Many publishers supply style files that format the article according
to the journal's specifications.
Even better, over the years I've created
an extensive collection
of bibliographies.
I can therefore use BibTeX to cite works with a simple command,
without having to re-enter their details.
This also allows me to use style files
to format references according to the publisher's specification.
Yet, there is still the problem of navigating from a citation to
the work's details.
Here is how I solve it.
Continue reading "Tags for Bibliography References"Last modified: Thursday, October 15, 2009 9:25 am
Applied Code Reading: Debugging FreeBSD Regex
When the code we're trying to
read is inscrutable,
inserting print statements and running various test cases can be
two invaluable tools.
Earlier today I fixed
a tricky problem in the FreeBSD regular expression library.
The code,
originally written by Henry Spencer in the early 1990s,
is by far the most complex I've ever encountered.
It implements sophisticated algorithms with minimal commenting.
Also, to avoid code repetition and increase efficiency,
the 1200 line long main part of the regular expression execution engine is
included in the compiled C code
three times after modifying various macros to adjust the code's behavior:
the first time the code targets small expressions and operates
with bit masks on long integers,
the second time the code handles larger expressions
by storing its data in arrays,
and the third time the code is also adjusted to handle multibyte characters.
Here is how I used test data and print statements to locate and fix the problem.
Continue reading "Applied Code Reading: Debugging FreeBSD Regex"Last modified: Wednesday, September 16, 2009 9:44 am
How to Create a Self-Referential Tweet
Yesterday Mark Reid
posted on
Twitter
a challenge:
create a self-referential tweet (one that links to itself).
He later
clarified that the
tweet should contain in its text its own identifier
(the number after "/status/" bit should be its own URL).
I decided to take up the challenge
("in order to learn a bit about the Twitter API" was my excuse),
and a few hours later I won the game by posting the first
self-referential tweet.
Here is how I did it.
Continue reading "How to Create a Self-Referential Tweet"Last modified: Wednesday, August 5, 2009 12:29 am
Fixing the Orientation of JPEG Photographs
I used to fix the orientation of my photographs through an application
that would transpose the compressed JPEG blocks.
This had the advantage of avoiding the image degradation of a
decompression and a subsequent compression.
Continue reading "Fixing the Orientation of JPEG Photographs"Last modified: Sunday, June 14, 2009 8:20 pm
Parallelizing Jobs with xargs
With multi-core processors sitting idle most of the time
and workloads always increasing,
it's important to have easy ways to make the CPUs earn their money's worth.
My colleague
Georgios Gousios
told me today how the Unix xargs command can help in this regard.
Continue reading "Parallelizing Jobs with xargs"Last modified: Wednesday, March 4, 2009 10:16 pm
A Well-Tempered Pipeline
I am studying the use of open source software in industry.
One way to obtain empirical data is to look at the operating systems and
browsers used by the Fortune 1000 companies by examining browser logs.
I obtained a list of the Fortune 1000 domains and wrote a pipeline
to summarize results by going through this site's access logs.
Continue reading "A Well-Tempered Pipeline"Last modified: Sunday, January 25, 2009 7:01 pm
Monitor Process Progress on Unix
I often run file-processing commands that take many hours to
finish, and I therefore need a way to monitor their progress.
The Perkin-Elmer/Concurrent OS32 system I worked-on for a couple
of years back in 1993 (don't ask)
had a facility that displayed for any executing
command the percentage of work that was completed.
When I first saw this facility working on the programs I maintained,
I couldn't believe my eyes, because I was sure that those rusty
Cobol programs didn't contain any functionality to monitor their progress.
Continue reading "Monitor Process Progress on Unix"Last modified: Monday, October 27, 2008 1:34 pm
Unzipping Files in Order
Over the past couple of years I've enjoyed listening to the
audio edition of the
Economist newspaper.
The material is superb
(although I occasionally get the feeling of listening to the
Voice of America),
the articles are read in a clear voice,
the data's encoding is plain MP3,
unencumbered by digital rights (restrictions) management silliness,
and the audio format is convenient to listen on the metro or while jogging.
Unfortunately, the articles in the audio edition's zip file are
haphazardly ordered, which, until today, marred the enjoyment of my listening.
Continue reading "Unzipping Files in Order"Last modified: Thursday, September 11, 2008 6:05 pm
A Child's Crontab
When the time to go to sleep is approaching,
all children seem to be configured with the same crontab.
Continue reading "A Child's Crontab"Last modified: Tuesday, August 5, 2008 9:35 am
Assigning Responsibility
Over the past few days I worked over a large code body correcting various
accumulated errors and style digressions.
When I finished I wanted to see who wrote the original lines.
(It turned out I was not entirely innocent.)
Continue reading "Assigning Responsibility"Last modified: Sunday, April 20, 2008 9:34 pm
The Treacherous Power of Extended Regular Expressions
I wanted to filter out lines containing the word "line" or a double quote
from a 1GB file.
This can be easily specified as an extended regular expression,
but it turns out that I got more than I bargained for.
Continue reading "The Treacherous Power of Extended Regular Expressions"Last modified: Tuesday, August 28, 2007 10:37 am
Breaking into a Virtual Machine
Say you're running your business on a rented
virtual private server.
How secure is your setup?
I wouldn't expect it to be more secure than the system your server runs
on, and a simple experiment confirmed it.
Continue reading "Breaking into a Virtual Machine"Last modified: Monday, April 16, 2007 10:14 pm
Make vs Ant: Observability
I've long felt uncomfortable with ant
as a build management tool.
I thought that my uneasiness stemmed from the verbose XML used for
describing tasks, and the lack of default dependency resolution.
Today, email from a UMLGraph user
struggling with a complex ant task
made me realize another problem:
lack of observability.
Continue reading "Make vs Ant: Observability"Last modified: Thursday, March 15, 2007 4:04 pm
Cracking Software Reuse
[Newton] said, "If I have seen further than others, it is because I've stood on the shoulders of giants." These days we stand on each other's feet!
— Richard Hamming
Continue reading "Cracking Software Reuse"Last modified: Friday, December 15, 2006 11:31 am
Batch Files as Shell Scripts
Although the Unix Bourne shell offers a superb environment for combining
existing commands into sophisticated programs, using a Unix shell
as an interactive command environment under Windows can be painful.
Continue reading "Batch Files as Shell Scripts"Last modified: Friday, June 16, 2006 3:58 pm
Efficiency Will Always Matter
Many claim that today's fast CPUs and large memory capacities make
time-proven technologies that efficiently harness a computer's power irrelevant.
I beg to differ, and my experience in the last three days demonstrated
that technologies that originated in the 70s still have their place today.
Continue reading "Efficiency Will Always Matter"Last modified: Monday, April 3, 2006 0:42 am
A Clash of Two Cultures
I dug the following gem from the Usenix
HotOS X Conference
Panel titled "Do we work within existing frameworks or start from scratch?",
summarized by Prashanth Bungale.
Continue reading "A Clash of Two Cultures"Last modified: Monday, December 5, 2005 8:25 pm
Working with Unix Tools
A successful [software] tool is one that was used to do something undreamed of by its author.
— Stephen C. Johnson
Continue reading "Working with Unix Tools"Last modified: Saturday, August 9, 2014 1:09 pm
Tool Writing: A Forgotten Art?
Merely adding features does not make it easier for users to do things—it just makes the manual thicker. The right solution in the right place is always more effective than haphazard hacking.
— Brian W. Kernighan and Rob Pike
Continue reading "Tool Writing: A Forgotten Art?"Last modified: Tuesday, December 12, 2006 8:20 pm
A Pipe Namespace in the Portal Filesystem
The portal filesystem allows a daemon running as a userland program
to pass descriptors to processes that open files belonging to its
namespace.
It has been part of the *BSD operating systems since 4.4 BSD.
I recently added a pipe namespace to its FreeBSD implementation.
This allows us to
perform scatter gather operations without using temporary files,
create non-linear pipelines, and
implement file views using symbolic links.
Continue reading "A Pipe Namespace in the Portal Filesystem"Last modified: Wednesday, September 21, 2016 10:38 pm
XML Versus Text Files
The JDepend
package dependency analyzer can output its results
either as XML or as plain text.
Instead of using the XML output,
I found myself processing the text output using awk.
Am I becoming tied to old-world thinking,
or are text files easier to process?
Continue reading "XML Versus Text Files"Last modified: Friday, February 18, 2005 1:05 pm
System administration stories: The Revolt
Can a small embedded system the size of a paperback
lead a group of machines into revolt?
Apparently yes.
Continue reading "System administration stories: The Revolt"Last modified: Saturday, September 11, 2004 10:47 pm
A Unix-based Logic Analyzer
A circuit I was designing was behaving in unexpected ways:
the output of a wireless serial receiver based on Infineon's TDA5200
was refusing to drive an LS TTL load.
To debug the problem I needed an oscilloscope or a logic analyzer,
but I had none.
I searched the web and located
software to convert the PC's parallel port to a logic analyzer.
I downloaded the 900K program, but that was not the end.
Unfortunately the design of Windows 2000 does not allow direct access
to the I/O ports, so I also downloaded
a parallel port device driver and a program to give the appropriate privileges to other
programs.
Finally, I also downloaded from a third site the Borland runtime libraries
required by the logic analyzer.
Needless to say that the combination refused to work.
Continue reading "A Unix-based Logic Analyzer"Last modified: Sunday, October 26, 2003 10:52 pm