Posts Tagged Databases

 

Unix make vs Apache Airflow

In an IEEE Software “Adventures in Code” column titled Modular Data Analytics I describe the benefits and use of simple-rolap, a tool suite for relational online analytical processing. I have built simple-rolap based on the Unix make tool and a few shell scripts. With make approaching its 50th birthday, before writing the column I looked for possible modern and better alternatives I might be ignoring.

Continue reading "Unix make vs Apache Airflow"

Fast database UPDATE/DELETE operations

You may be familiar with the use of a database upsert of MERGE operation to insert a record into a table or update an existing record, if that record already exists. This evaluates the condition for finding the record only once, and is therefore more efficient than other alternatives. How can you efficiently handle a reverse operation of updating a record and deleting it if some condition holds?

Continue reading "Fast database UPDATE/DELETE operations"

How to monitor MySQL / MariaDB query progress

The progress indicator of MySQL or MariaDB long-running commands and queries is extremely extremely and frustratingly coarse. In an index update I’m running now it was stuck in the same state for more than three hours. Thankfully, the pmonitor tool allows us to precisely monitor the progress of many commands. Here’s an example of its application on MariaDB.

Continue reading "How to monitor MySQL / MariaDB query progress"

How I slashed a SQL query runtime from 380 hours to 12 with two Unix commands

I was trying to run a simple join query on MariaDB (MySQL) and its performance was horrendous. Here’s how I cut down the query’s run time from over 380 hours to under 12 hours by executing part of it with two simple Unix commands.

Continue reading "How I slashed a SQL query runtime from 380 hours to 12 with two Unix commands"

Modular SQL Queries with Unit Tests

I’m sure I’m not the only person on earth facing a complex and expensive analytical processing task. The one I’ve been working on for the past couple of years, runs on the GHTorrent 98.5 GB data set of GitHub process data. It comprises 99 SQL queries (2599 lines of SQL code in total) and takes more than 20 hours to run on a hefty server. To make the job’s parts run efficiently and reliably I implemented simple-rolap, a bare-bones relational online analytical processing tool suite. To ensure the queries produce correct results, I wrote RDBUnit, a unit testing framework for relational database queries. Here is a quick overview on how to use the two.

Continue reading "Modular SQL Queries with Unit Tests"

Faking it

This column is about a tool we no longer have: the continuous rise of the CPU clock frequency. We were enjoying this trend for decades, but in the past few years, progress stalled. CPUs are no longer getting faster because their makers can’t handle the heat of faster-switching transistors. Furthermore, increasing the CPU’s sophistication to execute our instructions more cleverly has hit the law of diminishing returns. Consequently, CPU manufacturers now package the constantly increasing number of transistors they can fit onto a chip into multiple cores—processing elements—and then ask us developers to put the cores to good use.

Continue reading "Faking it"

Become a Unix command line wizard
edX MOOC on Unix Tools: Data, Software, and Production Engineering
Debug like a master
Book cover of Effective Debugging
Compute with style
Book cover of The Elements of Computing Style
Syndication
This blog is also available as an RSS feed:

Category Tags
AI (4)
AWS (4)
Android (2)
Apple (11)
C (21)
C++ (17)
Computers (58)
Databases (6)
Debugging (10)
Discussion (6)
Electronics (15)
Environment (1)
FreeBSD (26)
Funny (14)
GSIS (5)
Git (2)
Google (6)
Government (3)
Hacks (26)
Hardware (27)
History (13)
Information systems (1)
Internet (12)
Java (26)
JavaScript (1)
Linux (7)
Management (27)
Microsoft (11)
One Laptop Per Child (3)
Open source (58)
Opinion (30)
Parenting (11)
Perl (13)
Photos (13)
Politics (5)
Programming (110)
Python (3)
R (1)
Raspberry Pi (6)
Risks (7)
Scala (1)
Science (35)
Security (26)
Sights (19)
Smartphones (3)
Software (22)
Software engineering (93)
Standards (7)
System administration (46)
Teaching (9)
Technology (33)
Testing (3)
Tips (43)
Tools of the Trade (52)
Travel (9)
UML (6)
Unix (53)
Web (31)
Windows (17)
Writing (46)
XML (10)
vim (5)
Archive
Complete contents (382)
2024 (3)
2023 (5)
2022 (2)
2021 (3)
2020 (15)
2019 (4)
2018 (5)
2017 (20)
2016 (7)
2015 (6)
2014 (5)
2013 (13)
2012 (17)
2011 (14)
2010 (13)
2009 (40)
2008 (40)
2007 (41)
2006 (48)
2005 (44)
2004 (30)
2003 (7)

Last update: Wednesday, December 4, 2024 1:52 pm

Creative Commons Licence BY NC

Unless otherwise expressly stated, all original material on this page created by Diomidis Spinellis is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.