Posts Tagged Unix

 

Unix make vs Apache Airflow

In an IEEE Software “Adventures in Code” column titled Modular Data Analytics I describe the benefits and use of simple-rolap, a tool suite for relational online analytical processing. I have built simple-rolap based on the Unix make tool and a few shell scripts. With make approaching its 50th birthday, before writing the column I looked for possible modern and better alternatives I might be ignoring.

Continue reading "Unix make vs Apache Airflow"

Extending the life of TomTom wearables

TomTom recently announced it would stop operating their supporting infrastructure by the end of September following its earlier decision to exit the wearables market. This means that its products, such as sports watches, will become effectively useless, as they will no longer be able to export their activities and sync them with tracker sites. Throwing away an otherwise fine watch only because its maker decided to shut down its proprietary infrastructure seems like a sad waste. Here is how you can download the watch’s data and upload it to Strava, a popular activity tracker, using open source software.

Continue reading "Extending the life of TomTom wearables"

Rather than alchemy, methodical troubleshooting

I recently encountered a pesky problem while trying to build a React Native project under Apple’s Xcode. The build would fail with an error reporting: EMFILE: too many open files, watch. Frustratingly, all available advice on the web pointed to different (often inexplicable) directions, none of which worked. After tormenting myself with these, I decided to troubleshoot the problem methodically, which allowed me to pinpoint it and solve it with an uncommon and noteworthy application of the git bisect command. Here’s the story.

Continue reading "Rather than alchemy, methodical troubleshooting"

The Evolution of the Unix System Architecture

Unix has evolved for more than five decades, shaping modern operating systems, key software technologies, and development practices. Studying the evolution of this remarkable system from an architectural perspective can provide insights on how to manage the growth of large, complex, and long-lived software systems. In 2016 my colleague Paris Avgeriou and I embarked on this study aiming to combine his software architecture insights with my software analytics skills. Here is a brief summary of the study, which was published this month in the IEEE Transactions on Software Engineering.

Continue reading "The Evolution of the Unix System Architecture"

Reviving the 1973 Unix text to voice translator

The early Research Edition Unix versions featured a program that would turn a stream of ASCII text into utterances that could be played by a voice synthesizer. The source code of this program was lost for years. Here’s the story of how I brought it back to life.

Continue reading "Reviving the 1973 Unix text to voice translator"

Error handling under Unix and Windows

One thing that struck me when I first encountered the 4.3BSD Unix system call documentation in the 1980s, was that each call was followed by an exhaustive list of the errors associated with it. Ten years later, when I was going through the Windows API, I was disappointed to see that very few functions documented their error conditions. This is a big deal.

Continue reading "Error handling under Unix and Windows"

Shell scripting for software developers

In an open online edX course on Unix tools I was running over the spring with more than a thousand registered learners, I got asked for ideas on how shell scripts can be useful. This is an intriguing question, because the course focuses mainly on performing one-off tasks in the areas of software development, data engineering, and system administration, rather than automation through shell scripts. In response, I posted how shell scripting improves my personal productivity. Here’s my take on how shell scripts are employed in diverse software development tasks. I plan to post further installments on system administration and data analytics.

Continue reading "Shell scripting for software developers"

Shell scripting for personal productivity

In an edX course on Unix tools I am running these weeks, I got asked for ideas on how shell scripts can be useful. This is a very interesting question, because the course focuses mainly on performing one-off tasks in the areas of software development, data engineering, and system administration, rather than automation through shell scripts. Here’s how I’m using shell scripting to enhance my personal productivity. I’ll post further installments regarding software development and system administration.

Continue reading "Shell scripting for personal productivity"

Seven reasons to add Unix command line expertise to your tool chest

On Tuesday March 17th 2020 my free online massive open online course (MOOC) on the use of Unix command line tools for data, software, and production engineering goes live on the edX platform. Already more than one thousand participants from around the world have registered for it; you should still be able to enroll through this link. In response to the course’s announcement seasoned researchers from around the world have commented that this is an indispensable course and that it is very hard to beat the ROI of acquiring this skillset, both for academia and industry. In an age of shiny IDEs and cool GUI tools, what are the reasons for the enduring utility and popularity of the Unix command line tools? Here’s my take.

Continue reading "Seven reasons to add Unix command line expertise to your tool chest"

Was Knuth Really Framed by Jon Bentley?

Recently, the formal methods specialist Hillel Wayne posted an interesting article discussing whether Donald Knuth was actually framed when Jon Bentley asked him to demonstrate literate programming. (Knuth came up with an 8-page long monolithic listing, whereas in a critique Doug McIlroy provided a six line shell script.) The article makes many interesting and valid points. However, among the raised points one is that the specified problem was ideal for solving with Unix tools, and that a different problem, such as “find the top K pairs of words and print the Levenshtein distance between each pair”, would be much more difficult to solve with Unix commands. As the developer of an edX massive open online course (MOOC) on the use of Unix Tools for data, software and production engineering I decided to put this claim to the test.

Continue reading "Was Knuth Really Framed by Jon Bentley?"

Convert file I/O into pipe I/O with /dev/fd

Some Unix commands read data from files or write data to files, without offering an obvious way to use them as part of a pipeline. How can you write a program to interact with such a command in a streaming fashion? This would allow your program and the command run concurrently, without the storage and I/O overhead of a temporary file. You could create and use a named pipe, but this is a clunky solution, requiring you to create and destroy a unique underlying file name. Here’s a better approach.

Continue reading "Convert file I/O into pipe I/O with /dev/fd"

How to monitor MySQL / MariaDB query progress

The progress indicator of MySQL or MariaDB long-running commands and queries is extremely extremely and frustratingly coarse. In an index update I’m running now it was stuck in the same state for more than three hours. Thankfully, the pmonitor tool allows us to precisely monitor the progress of many commands. Here’s an example of its application on MariaDB.

Continue reading "How to monitor MySQL / MariaDB query progress"

Java Stream Methods and Unix Pipeline Commands: A Dictionary

While preparing my class notes for functional programming in Java I was struck between the neat correspondence between many Java Stream methods and Unix commands. I decided to organize the most common of these in a dictionary form that allows the mapping between the two. I’d very much welcome comments regarding common patterns that I’ve missed.

Continue reading "Java Stream Methods and Unix Pipeline Commands: A Dictionary"

How I slashed a SQL query runtime from 380 hours to 12 with two Unix commands

I was trying to run a simple join query on MariaDB (MySQL) and its performance was horrendous. Here’s how I cut down the query’s run time from over 380 hours to under 12 hours by executing part of it with two simple Unix commands.

Continue reading "How I slashed a SQL query runtime from 380 hours to 12 with two Unix commands"

How to Perform Set Operations on Terabyte Files

The Unix sort command can efficiently handle files of arbitrary size (think of terabytes). It does this by loading into main memory all the data that can fit into it (say 16GB), sorting that data efficiently using an O(N log N) algorithm, and then merge-sorting the chunks with a linear complexity O(N) cost. If the number of sorted chunks is higher than the number of file descriptors that the merge operation can simultaneously keep open (typically more than 1000), then sort will recursively merge-sort intermediate merged files. Once you have at hand sorted files with unique elements, you can efficiently perform set operations with them through linear complexity O(N) operations. Here is how to do it.

Continue reading "How to Perform Set Operations on Terabyte Files"

Reviving the 1973 Unix Programmer’s Manual

The 1973 Fourth Edition of the Unix Programmer’s Manual doesn’t seem to be available online in typeset form. This is how I managed to recreate it from its source code.

Continue reading "Reviving the 1973 Unix Programmer’s Manual"

How I Recovered my Firefox Tab Groups

When quit and restarted Firefox today I received an unwelcomed shock. All my tab groups, which I maintained using the Tab Groups by Quicksaver plugin, were gone! This happened because it upgraded to Firefox Quantum (57), whose API does not maintain backward compatibility with the one used by the plugin. Although I knew the plugin would one day stop working, I thought there would be some last-minute warning and chance to export the tab groups.

Continue reading "How I Recovered my Firefox Tab Groups"

The Origins of Malloc

The 1973 Fourth Edition Unix kernel source code contains two routines, malloc and mfree, that manage the dynamic allocation and release of main memory blocks for in-memory processes and of continuous disk swap area blocks for swapped-out processes. Their implementation and history can teach us many things regarding modern computing.

Continue reading "The Origins of Malloc"

Display Git’s and Current Directory on Terminal Bar

I typically have more than ten windows open on my desktop and rely on their names to select them. Being a command-line aficionado, most of them are terminals. I have them configured to display the current directory by setting the bash PROMPT_COMMAND environment variable to 'printf "\033]0;%s:%s\007" "${HOSTNAME%%.*}" "${PWD/#$HOME/~}"'. The problem is that the directory I’m often in has a generic name, such as src or doc, so the terminal’s name isn’t very useful.

Continue reading "Display Git’s and Current Directory on Terminal Bar"

Unix Architecture Evolution Diagrams

Today I put online two diagrams depicting the architecture of the Unix operating system, one for the 1972 First Research Edition and one for FreeBSD, one of its direct descendants. Here are the details on how I created these diagrams.

Continue reading "Unix Architecture Evolution Diagrams"

The 1980s Research Unix Editions Are Now Available for Study

In 2002 Caldera International licensed the source code distribution of several historic Unix editions. This included all Research Unix editions up to the Seventh Edition, but excluded the 1980s 8th, 9th, and 10th Edition. This was unfortunate, because these editions pioneered or implemented several features that were very advanced at the time, such as streams inter-process communication, graphics terminals and the associated Sam text editor, network filesystems, and graphics typesetting tools.

Continue reading "The 1980s Research Unix Editions Are Now Available for Study"

How to avoid redoing manual corrections

Say you have an automated process to create a report, which you then have to polish by hand, because there are adjustments that require human judgment. After three hours of polishing, you realize that the report is full of errors due to a bug in the initial reporting process. Is there a way to salvage the three hours of work you put into it?

Continue reading "How to avoid redoing manual corrections"

Verifying the Substitution Cipher Folklore

A substitution cipher has each letter substituted with another. Cryptography folklore has it that simple substitution ciphers are trivial to break by looking at the letter frequencies of the encrypted text. I tested the folklore and the results were not quite what I was expecting.

Continue reading "Verifying the Substitution Cipher Folklore"

The Birth of Standard Error

Earlier today Stephen Johnson, in a mailing list run by the The Unix Heritage Society, described the birth of the standard error concept: the idea that a program's error output is sent on a channel different from that of its normal output. Over the past forty years, all major operating systems and language libraries have embraced this concept.

Continue reading "The Birth of Standard Error"

How to Calculate an Operation's Memory Consumption

How can you determine how much memory is consumed by a specific operation of a Unix program? Valgrind's Massif subsystem could help you in this regard, but it can be difficult to isolate a specific operation from Massif's output. Here is another, simpler way.

Continue reading "How to Calculate an Operation's Memory Consumption"

Pretend Invitations

Choosing between people you want to invite to a function and people you have to invite is sometimes difficult. Say Alice wants to invite Tom, Dick, and Harry to a party, but she'd actually prefer if Dick didn't show up. Here's how Alice can send invitations by email from an email-capable Unix system to achieve the desired result, while covering her scheming with plausible deniability.

Continue reading "Pretend Invitations"

Apps are the New Users

Some facilities provided by mature multi-user operating systems appear arcane today. Administrators of computers running Mac OS X or Linux can see users logged-in from remote terminals, they can specify limits on the disk space one can use, and they can run accounting statistics to see how much CPU time or disk I/O a user has consumed over a month. These operating systems also offer facilities to group users together, to specify various protection levels for each user's files, and to prescribe which commands a user can run.

Continue reading "Apps are the New Users"

Code Verification Scripts

Which of my classes contain instance variables? Which classes call the method userGet, but don't call the method userRegister? These and similar questions often come up when you want to verify that your code is free from some errors. For example, instance variable can be a problem in servlet classes. Or you may have found a bug related to the userGet and userRegister methods, and you want to look for other places where this occurs. Your IDE is unlikely to answer such questions, and this is where a few lines in the Unix shell can save you hours of frustration.

Continue reading "Code Verification Scripts"

Batch Files as Shell Scripts Revisited

Four years ago I wrote about a method that could be used to have the Unix Bourne shell interpret Windows batch files. I'm using this trick a lot, because programming using the Windows/DOS batch files facilities is decidedly painful, whereas the Bourne shell remains a classy programming environment. There are still many cases where the style of Unix shell programming outshines and outperforms even modern scripting languages.

Continue reading "Batch Files as Shell Scripts Revisited"

Useful Polyglot Code

Four years ago I blogged about an incantation that would allow the Windows command interpreter (cmd) to execute Unix shell scripts written inside plain batch files. Time for an update.

Continue reading "Useful Polyglot Code"

Tags for Bibliography References

I love writing my papers in LaTeX. Its declarative style allows me to concentrate on the content, rather than the form. I even format the text according to the content, keeping each phrase or logical unit on a separate line. Many publishers supply style files that format the article according to the journal's specifications. Even better, over the years I've created an extensive collection of bibliographies. I can therefore use BibTeX to cite works with a simple command, without having to re-enter their details. This also allows me to use style files to format references according to the publisher's specification. Yet, there is still the problem of navigating from a citation to the work's details. Here is how I solve it.

Continue reading "Tags for Bibliography References"

Applied Code Reading: Debugging FreeBSD Regex

When the code we're trying to read is inscrutable, inserting print statements and running various test cases can be two invaluable tools. Earlier today I fixed a tricky problem in the FreeBSD regular expression library. The code, originally written by Henry Spencer in the early 1990s, is by far the most complex I've ever encountered. It implements sophisticated algorithms with minimal commenting. Also, to avoid code repetition and increase efficiency, the 1200 line long main part of the regular expression execution engine is included in the compiled C code three times after modifying various macros to adjust the code's behavior: the first time the code targets small expressions and operates with bit masks on long integers, the second time the code handles larger expressions by storing its data in arrays, and the third time the code is also adjusted to handle multibyte characters. Here is how I used test data and print statements to locate and fix the problem.

Continue reading "Applied Code Reading: Debugging FreeBSD Regex"

How to Create a Self-Referential Tweet

Yesterday Mark Reid posted on Twitter a challenge: create a self-referential tweet (one that links to itself). He later clarified that the tweet should contain in its text its own identifier (the number after "/status/" bit should be its own URL). I decided to take up the challenge ("in order to learn a bit about the Twitter API" was my excuse), and a few hours later I won the game by posting the first self-referential tweet. Here is how I did it.

Continue reading "How to Create a Self-Referential Tweet"

Fixing the Orientation of JPEG Photographs

I used to fix the orientation of my photographs through an application that would transpose the compressed JPEG blocks. This had the advantage of avoiding the image degradation of a decompression and a subsequent compression.

Continue reading "Fixing the Orientation of JPEG Photographs"

Parallelizing Jobs with xargs

With multi-core processors sitting idle most of the time and workloads always increasing, it's important to have easy ways to make the CPUs earn their money's worth. My colleague Georgios Gousios told me today how the Unix xargs command can help in this regard.

Continue reading "Parallelizing Jobs with xargs"

A Well-Tempered Pipeline

I am studying the use of open source software in industry. One way to obtain empirical data is to look at the operating systems and browsers used by the Fortune 1000 companies by examining browser logs. I obtained a list of the Fortune 1000 domains and wrote a pipeline to summarize results by going through this site's access logs.

Continue reading "A Well-Tempered Pipeline"

Monitor Process Progress on Unix

I often run file-processing commands that take many hours to finish, and I therefore need a way to monitor their progress. The Perkin-Elmer/Concurrent OS32 system I worked-on for a couple of years back in 1993 (don't ask) had a facility that displayed for any executing command the percentage of work that was completed. When I first saw this facility working on the programs I maintained, I couldn't believe my eyes, because I was sure that those rusty Cobol programs didn't contain any functionality to monitor their progress.

Continue reading "Monitor Process Progress on Unix"

Unzipping Files in Order

Over the past couple of years I've enjoyed listening to the audio edition of the Economist newspaper. The material is superb (although I occasionally get the feeling of listening to the Voice of America), the articles are read in a clear voice, the data's encoding is plain MP3, unencumbered by digital rights (restrictions) management silliness, and the audio format is convenient to listen on the metro or while jogging. Unfortunately, the articles in the audio edition's zip file are haphazardly ordered, which, until today, marred the enjoyment of my listening.

Continue reading "Unzipping Files in Order"

A Child's Crontab

When the time to go to sleep is approaching, all children seem to be configured with the same crontab.

Continue reading "A Child's Crontab"

Assigning Responsibility

Over the past few days I worked over a large code body correcting various accumulated errors and style digressions. When I finished I wanted to see who wrote the original lines. (It turned out I was not entirely innocent.)

Continue reading "Assigning Responsibility"

The Treacherous Power of Extended Regular Expressions

I wanted to filter out lines containing the word "line" or a double quote from a 1GB file. This can be easily specified as an extended regular expression, but it turns out that I got more than I bargained for.

Continue reading "The Treacherous Power of Extended Regular Expressions"

Breaking into a Virtual Machine

Say you're running your business on a rented virtual private server. How secure is your setup? I wouldn't expect it to be more secure than the system your server runs on, and a simple experiment confirmed it.

Continue reading "Breaking into a Virtual Machine"

Make vs Ant: Observability

I've long felt uncomfortable with ant as a build management tool. I thought that my uneasiness stemmed from the verbose XML used for describing tasks, and the lack of default dependency resolution. Today, email from a UMLGraph user struggling with a complex ant task made me realize another problem: lack of observability.

Continue reading "Make vs Ant: Observability"

Cracking Software Reuse

[Newton] said, "If I have seen further than others, it is because I've stood on the shoulders of giants." These days we stand on each other's feet!

— Richard Hamming

Continue reading "Cracking Software Reuse"

Batch Files as Shell Scripts

Although the Unix Bourne shell offers a superb environment for combining existing commands into sophisticated programs, using a Unix shell as an interactive command environment under Windows can be painful.

Continue reading "Batch Files as Shell Scripts"

Efficiency Will Always Matter

Many claim that today's fast CPUs and large memory capacities make time-proven technologies that efficiently harness a computer's power irrelevant. I beg to differ, and my experience in the last three days demonstrated that technologies that originated in the 70s still have their place today.

Continue reading "Efficiency Will Always Matter"

A Clash of Two Cultures

I dug the following gem from the Usenix HotOS X Conference Panel titled "Do we work within existing frameworks or start from scratch?", summarized by Prashanth Bungale.

Continue reading "A Clash of Two Cultures"

Working with Unix Tools

A successful [software] tool is one that was used to do something undreamed of by its author.

— Stephen C. Johnson

Continue reading "Working with Unix Tools"

Tool Writing: A Forgotten Art?

Merely adding features does not make it easier for users to do things—it just makes the manual thicker. The right solution in the right place is always more effective than haphazard hacking.

— Brian W. Kernighan and Rob Pike

Continue reading "Tool Writing: A Forgotten Art?"

A Pipe Namespace in the Portal Filesystem

The portal filesystem allows a daemon running as a userland program to pass descriptors to processes that open files belonging to its namespace. It has been part of the *BSD operating systems since 4.4 BSD. I recently added a pipe namespace to its FreeBSD implementation. This allows us to perform scatter gather operations without using temporary files, create non-linear pipelines, and implement file views using symbolic links.

Continue reading "A Pipe Namespace in the Portal Filesystem"

XML Versus Text Files

The JDepend package dependency analyzer can output its results either as XML or as plain text. Instead of using the XML output, I found myself processing the text output using awk. Am I becoming tied to old-world thinking, or are text files easier to process?

Continue reading "XML Versus Text Files"

System administration stories: The Revolt

Can a small embedded system the size of a paperback lead a group of machines into revolt? Apparently yes.

Continue reading "System administration stories: The Revolt"

A Unix-based Logic Analyzer

A circuit I was designing was behaving in unexpected ways: the output of a wireless serial receiver based on Infineon's TDA5200 was refusing to drive an LS TTL load. To debug the problem I needed an oscilloscope or a logic analyzer, but I had none. I searched the web and located software to convert the PC's parallel port to a logic analyzer. I downloaded the 900K program, but that was not the end. Unfortunately the design of Windows 2000 does not allow direct access to the I/O ports, so I also downloaded a parallel port device driver and a program to give the appropriate privileges to other programs. Finally, I also downloaded from a third site the Borland runtime libraries required by the logic analyzer. Needless to say that the combination refused to work.

Continue reading "A Unix-based Logic Analyzer"

Become a Unix command line wizard
edX MOOC on Unix Tools: Data, Software, and Production Engineering
Debug like a master
Book cover of Effective Debugging
Compute with style
Book cover of The Elements of Computing Style
Syndication
This blog is also available as an RSS feed:

Category Tags
AI (4)
AWS (4)
Android (2)
Apple (11)
C (21)
C++ (17)
Computers (58)
Databases (6)
Debugging (10)
Discussion (6)
Electronics (15)
Environment (1)
FreeBSD (26)
Funny (14)
GSIS (5)
Git (2)
Google (6)
Government (3)
Hacks (26)
Hardware (27)
History (13)
Information systems (1)
Internet (12)
Java (26)
JavaScript (1)
Linux (7)
Management (27)
Microsoft (11)
One Laptop Per Child (3)
Open source (58)
Opinion (30)
Parenting (11)
Perl (13)
Photos (13)
Politics (5)
Programming (110)
Python (3)
R (1)
Raspberry Pi (6)
Risks (7)
Scala (1)
Science (35)
Security (26)
Sights (19)
Smartphones (3)
Software (22)
Software engineering (93)
Standards (7)
System administration (46)
Teaching (9)
Technology (33)
Testing (3)
Tips (43)
Tools of the Trade (52)
Travel (9)
UML (6)
Unix (53)
Web (31)
Windows (17)
Writing (46)
XML (10)
vim (5)
Archive
Complete contents (382)
2024 (3)
2023 (5)
2022 (2)
2021 (3)
2020 (15)
2019 (4)
2018 (5)
2017 (20)
2016 (7)
2015 (6)
2014 (5)
2013 (13)
2012 (17)
2011 (14)
2010 (13)
2009 (40)
2008 (40)
2007 (41)
2006 (48)
2005 (44)
2004 (30)
2003 (7)

Last update: Wednesday, December 4, 2024 1:52 pm

Creative Commons Licence BY NC

Unless otherwise expressly stated, all original material on this page created by Diomidis Spinellis is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.