Software Maintenance: The Open Source Perspective
Diomidis Spinellis
Department of Management Science and Technology
Athens University of Economics and Business
Athens, Greece
dds@aueb.gr
Course Introduction
Welcome
Advanced Topics in Software Engineering
Course Goals
- Comprehend and use basic implementation elements
- Code
- System structures
- Architecture
- Non-functional properties
- Documentation
- Learn and use important software engineering processes
- Revision and configuration management
- Building automation
- Issue tracking
- Tooling
- Appreciate and understand maintenance activities
- Be able to change existing systems
- Make intelligent decisions regarding maintenance processes
Teaching
- Participation
- Questions
- Try to do at least one assignment / week
- Use the lab
- Home study
- Course project
Course Notes
- Equivalent to the presentation "slides"
- Accessible over the web
http://www.spinellis.gr
- Continously updated
- Links for printable version at the end of the table of contents
Course Grade
- Project-based
- Contribute to an open-source project
- Grading criteria (total 120!):
- Understanding and documentation of legacy system (10%)
- Breadth of changes (20%)
- Design quality (10%)
- Implementation quality (10%)
- Integration (10%)
- Testing (10%)
- Coordination with the development team (10%)
- Presentation (10%)
- Documentation and quality of the deliverables (10%)
- Blog (10%)
- Class participation (10%)
Schedule of Assignments
- Week 2 (March 3rd): Form teams (1-2 persons)
- Week 3 (March 16th): Select project
- Week 4 (March 20th): Existing project presentation (4-8 mins)
- Week 6 (April 3rd): Design presentation (4-8 mins)
- Week 13 (June 5th): Implementation presentation (5-10 mins)
Participation
Recommended Reading
- Diomidis Spinellis.
Code Reading: The Open
Source Perspective.
Addison-Wesley, Boston, MA, 2003. (In English or Greek.)
- Diomidis Spinellis.
Code Quality: The Open
Source Perspective.
Addison-Wesley, Boston, MA, 2006. (In English or Greek.)
- Stephanos
Androutsellis-Theotokis, Diomidis Spinellis,
Maria Kechagia, and Georgios Gousios.
Open
source software: A survey from 10,000 feet.
Foundations and Trends in Technology, Information and Operations
Management, 4(3–4):187–347, 2010.
(doi:10.1561/0200000026 (http://dx.doi.org/10.1561/0200000026))
- Manolis Diakoimakis and Nikos Diamantidis. Software Engineering. Stamoulis 2009. (In Greek).
- Martin Fowler.
Refactoring: Improving the Design of Existing Code.
Addison-Wesley, Boston, MA, 2000.
With contributions by Kent Beck, John Brant, William Opdyke, and Don Roberts.
Course Overview
- Code as Part of the Software Development Process
- The Open Source Landscape
- Tackling Large Projects
- Version Control
- General Purpose Tools
- Build Management
- Collaboration
- Performance Measurement and Management
- Code-Reading Tools
- Inspection and Testing
- Coding Standards and Conventions
- Maintainability
- Documentation and Visualization
Further Reading
- Stephanos Androutsellis-Theotokis, Diomidis Spinellis,
Maria Kechagia, and Georgios Gousios.
Open
source software: A survey from 10,000 feet.
Foundations and Trends in Technology, Information and Operations
Management, 4(3–4):187–347, 2010.
(doi:10.1561/0200000026 (http://dx.doi.org/10.1561/0200000026))
- Tavish Armstrong.
The Performance of Open
Source Applications.
2013.
- Amy Brown and Greg
Wilson.
The Architecture of Open
Source Applications.
2012.
- Brian D. Eubanks.
Wicked Cool
Java: Code Bits, Open-Source Libraries, and Project Ideas.
No Starch Press, San Francisco, 2006.
- Michael Feathers.
Working
Effectively with Legacy Code.
Prentice Hall, Englewood Cliffs, NJ, 2005.
- Martin Fowler.
Refactoring:
Improving the Design of Existing Code.
Addison-Wesley, Boston, MA, 2000.
With contributions by Kent Beck, John Brant, William Opdyke, and Don Roberts.
- Pete Goodlife.
Code Craft: The
Practice of Writing Excellent Code.
No Starch Press, San Francisco, 2007.
- Andrew Hunt and David
Thomas.
The Pragmatic
Programmer: From Journeyman to Master.
Addison-Wesley, Boston, MA, 2000.
- Andy Hunt and Dave
Thomas.
Software archeology.
IEEE Software, 19(2):20–22, March/April 2002.
- Brian W. Kernighan
and Rob Pike.
The UNIX
Programming Environment.
Prentice-Hall, Englewood Cliffs, NJ, 1984.
- Brian W. Kernighan
and Rob Pike.
The Practice of
Programming.
Addison-Wesley, Reading, MA, 1999.
- John Lions.
Lions'
Commentary on Unix 6th Edition with Source Code.
Annabooks, Poway, CA, 1996.
- Steve C McConnell.
Code Complete:
A Practical Handbook of Software Construction.
Microsoft Press, Redmond, WA, second edition, 2004.
- Andy Oram and Greg
Wilson.
Beautiful Code:
Leading Programmers Explain How They Think.
O'Reilly and Associates, Sebastopol, CA, 2007.
- Charles Petzold.
Code: The
Hidden Language of Computer Hardware and Software.
Microsoft Press, Redmond, WA, 1999.
- Eric S. Raymond.
The Cathedral
and the Bazaar: Musings on Linux and Open Source by an Accidental
Revolutionary.
O' Reilly and Associates, Sebastopol, CA, 2001.
- Eric S. Raymond.
The Art Of Unix
Programming.
Addison-Wesley, 2003.
- Sulayman K. Sowe,
Ioannis G. Stamelos, and Ioannis Samoladas, editors.
Emerging Free and Open
Source Software Practices.
IGI Publishing, Hershey, PA, 2007.
- Diomidis Spinellis
and Georgios Gousios, editors.
Beautiful Architecture:
Leading Software Engineers Explain How They Think.
O'Reilly, Sebastopol, CA, 2009.
- Diomidis
Spinellis and Clements Szyperski.
How
is open source affecting software development?.
IEEE Software, 21(1):28–33, January/February 2004.
- Diomidis Spinellis.
Code Reading: The Open
Source Perspective.
Addison-Wesley, Boston, MA, 2003.
- Diomidis Spinellis.
Code Quality: The Open
Source Perspective.
Addison-Wesley, Boston, MA, 2006.
Exercises and Discussion Topics
- Establish a plan for reading some of the recommended books.
Lend the first one from the library.
Code as Part of the Software Development Process
Code as a Scientific Communication Vehicle
Social processes that contributed to the success of mathematical theorems:
- Publication and review
- Discussion, internalizing, generalizing, paraphrasing
- Used for solving real problems
Gains from High-quality Code
We learn:
- New architectural patterns
- Data structures
- Coding methods
- Algorithms
- Style conventions
- Documentation mechanisms
- APIs
- Languages
Recognize Low-quality Code
- Inconsistent coding style
- Gratuitously complicated or unreadable structure
- Obvious logical errors, or omissions
- Overuse of non-portable constructs
- Lack of maintenance
- Inefficient or insecure
Requirements Can Lead to Strange Code
- Portability (e.g. 6 character identifiers)
- Efficiency
- Space restrictions
- Present an idea
- Protect against reverse engineering
How to Familiarize Yourself with a Body of Code
Don't Panic! (typically you don't have to understand all the code)
Four steps:
- Build
- Browse
- Improve
- Contribute
What you gain from OSS is a loan that you have to repay.
Code Reading Reasons
- Code as Exemplar
- Maintenance
- Evolution
- Porting
- Refactoring
- Reuse
- Inspections
Code as Exemplar
Reasons
- Understand how it works
- Create compatible software
Strategy
- Try multiple approaches
- Search
- Browse
- Read documentation
- Run
- Run under a debugger
- Trace
- Ignore irrelevant details
Example: tail -f (NetBSD)
while ((ch = getopt(argc, argv, "b:c:fn:r")) != -1)
switch(ch) {
[...]
case 'f':
fflag = 1;
break;
}
[...]
for (;;) {
while ((ch = getc(fp)) != EOF)
if (putchar(ch) == EOF)
oerr();
[...]
if (!fflag)
break;
/*
* We pause for one second after displaying any data that has
* accumulated since we read the file. Since sleep(3) takes
* eight system calls, use select() instead.
*/
second.tv_sec = 1;
second.tv_usec = 0;
if (select(0, NULL, NULL, NULL, &second) == -1)
err(1, "select: %s", strerror(errno));
clearerr(fp);
}
Example: tail -f (FreeBSD)
if (fflag) {
kq = kqueue();
if (kq < 0)
err(1, "kqueue");
action = ADD_EVENTS;
}
for (;;) {
while ((ch = getc(fp)) != EOF)
if (putchar(ch) == EOF)
oerr();
[...]
if (! fflag)
break;
clearerr(fp);
switch (action) {
case ADD_EVENTS:
n = 0;
ts.tv_sec = 0;
ts.tv_nsec = 0;
if (Fflag && fileno(fp) != STDIN_FILENO) {
EV_SET(&ev[n], fileno(fp), EVFILT_VNODE,
EV_ADD | EV_ENABLE | EV_CLEAR,
NOTE_DELETE | NOTE_RENAME, 0, 0);
n++;
}
EV_SET(&ev[n], fileno(fp), EVFILT_READ,
EV_ADD | EV_ENABLE | EV_CLEAR, 0, 0, 0);
n++;
if (kevent(kq, ev, n, NULL, 0, &ts) < 0) {
action = USE_SLEEP;
} else {
action = USE_KQUEUE;
}
break;
kqueue, kevent - kernel event notification mechanism
Maintenance and its Tools
- Use tools
- Compiler warnings
- Examine symbolic code
- System call tracing
- Debugger
- SQL log
- Packet dump
- Message spy
- Work from problem manifestation to problem source
- Eschew unrelated paths
- Compile with debugging options
- Run under a debugger
- Single step
- Stack trace
- Code breakpoint
- Data breakpoint
- Add print statements
Evolution
Maintenance activities often take 80% of the effort over a complete
life cycle.
Maintenance Activities
- Add new functionality
- Modify existing features
- Adapt to new environments
- Refactor (enhance nonfunctional qualities)
Strategy
- Locate code of interest
- Understand part in isolation
- Infer relationship with rest of the code
Example: locating the authentication code in the ftp command.
Porting
- Compile
- Remove compiler errors
- Remove linker errors
- Examine warnings
- Configure
- Test
If the two environments differ a lot: focus on interfaces.
Refactoring
- Leaves external behavior unchanged.
- Improves:
- Simplicity
- Flexibility
- Understandability
- Performance
- In common with cosmetic surgery: start and end with a working system.
- Use test cases
- Scenarios:
- Fix trouble spot
- Spending "quality time"
Reuse
- Limit your expectations
- Reusable software is software that has been reused
- Making software reusable can add 50% to the effort
- Strategy:
- Isolate code
- If fails, try larger granularity packaging (library, component, process, system)
- Make it a habit to look for reusable code
Inspections
- Technical review
- Walkthrough
- Inspection
- Pair programming
- Strategy:
- Thoroughness
- Examine tricky constructs
- Verify against specifications
- Remember nonfunctional properties
- Remember non-code elements
Further Reading
- Phillip G. Armour.
The case for a new business model: Is software a product or a medium?
Communications of the ACM, 43(8):19–22, August 2000.
- Kent Beck.
Extreme Programming Explained: Embrace Change.
Addison-Wesley, Boston, MA, 2000.
- Jon Louis Bentley
and Donald E. Knuth.
Programming pearls: A tt WEB program for sampling.
Communications of the ACM, 29(5):364–369, May 1986.
- Jon Louis Bentley,
Donald E. Knuth, and Douglas McIlroy.
A literate program.
Communications of the ACM, 19(6):471–483, June 1986.
- Barry W. Boehm, Bradford
Clark, Ellis Horowitz, Ray Madachy, Richard Shelby, and Chris Westland.
Cost
models for future life cycle processes: COCOMO 2.
Annals of Software Engineering, 1:57–94, 1995.
- Barry W. Boehm.
Software Engineering Economics.
Prentice-Hall, Englewood Cliffs, NJ, 1981.
- Barry W. Boehm.
The economics of software maintenance.
In Software Maintenance Workshop, pages 9–37, 1983.
- R. DeMillo, R. Lipton,
and A. Perlis.
Social processes and proofs of theorems and programs.
In Proc. Fourth ACM Symposium on Principles of Programming
Languages, pages 206–214, New York, January 1977. ACM Press.
- Khaled El-Emam.
Ethics and open source.
Empirical Software Engineering, 6(4):291–292, December 2001.
- Michael Feathers.
Working Effectively with Legacy Code.
Prentice-Hall, Englewood Cliffs, NJ, 2005.
- Martin Fowler.
Refactoring: Improving the Design of Existing Code.
Addison-Wesley, Boston, MA, 2000.
With contributions by Kent Beck, John Brant, William Opdyke, and Don
Roberts.
- Richard P. Gabriel
and Ron Goldman.
Mob software: The erotic
life of code.
Presented at the ACM Conference on Object-Oriented Programming, Systems,
Languages, and Applications on October 19, 2000, in Minneapolis, Minnesota.,
October 2000.
Online http://www.dreamsongs.com/MobSoftware.html. Current May 2002.
- Robert L. Glass.
The sociology of open source: Of cults and cultures.
IEEE Software, 17(3):104–105, May/June 2000.
- Eric Hamilton.
Literate programming—expanding generalized regular expressions.
Communications of the ACM, 31(12):1376–1385, December
1988.
- David R. Hanson.
Literate programming—printing common words.
Communications of the ACM, 30(7):594–599, July 1987.
- Andy Hunt and Dave
Thomas.
Software archeology.
IEEE Software, 19(2):20–22, March/April 2002.
- Michael A. Jackson.
Literate programming—processing transactions.
Communications of the ACM, 30(12):1000–1010, December
1987.
- Brian W. Kernighan
and P. J. Plauger.
Software Tools.
Addison-Wesley, Reading, MA, 1976.
- Gregor Kiczales, Erik
Hilsdale, Jim Hugunin, Mik Kersten, Jeffrey Palm, and William G. Griswold.
Getting started with AspectJ.
Communications of the ACM, 44(10):59–65, October 2001.
- Donald E. Knuth.
METAFONT: The Program.
Addison-Wesley, Reading, MA, 1986.
- Donald E. Knuth.
TeX:
The Program.
Addison-Wesley, Reading, MA, 1986.
- Donald E. Knuth.
Literate Programming.
CSLI Lecture Notes Number 27. Stanford University Center for the Study of
Language and Information, Stanford, CA, 1992.
Distributed by the University of Chicago Press.
- Charles W. Krueger.
Software reuse.
ACM Computing Surveys, 24(2):131–183, June 1992.
- B. P. Lientz and
E. B. Swanson.
Software Maintenance Management.
Addison-Wesley, Reading, MA, 1980.
- John Lions.
Lions'
Commentary on Unix 6th Edition with Source Code.
Annabooks, Poway, CA, 1996.
- Hafedh Mili, Fatma Mili,
and Ali Mili.
Reusing software: Issues and research directions (http://dlib.computer.org/dynaweb/ts/ts1995/@Generic__BookTextView/101764;hf=0).
IEEE Transactions on Software Engineering, 21(6):528–562, June
1995.
- Charles Petzold.
Code:
The Hidden Language of Computer Hardware and Software.
Microsoft Press, Redmond, WA, 1999.
- Roger S. Pressman.
Software Engineering: A Practitioner's Approach.
McGraw-Hill, London, fifth edition, 2000.
European Adaptation. Adapted by Darrel Ince.
- Cristiane S. Ramos,
Káthia M. Oliveira, and Nicolas Anquetil.
Legacy software evaluation model for outsourced maintainer.
In Eighth Euromicro Working Conference on Software Maintenance and
Reengineering (CSMR'04), pages 48–57. IEEE Computer Society, March
2004.
- Eric S. Raymond.
The
Cathedral and the Bazaar: Musings on Linux and Open Source by an Accidental
Revolutionary.
O' Reilly and Associates, Sebastopol, CA, 2001.
- Diomidis Spinellis
and Konstantinos Raptis.
Component mining: A process and its pattern language ( http://www.spinellis.gr/pubs/jrnl/2000-IST-Components/html/comp.html).
Information and Software Technology, 42(9):609–617, June
2000.
- Diomidis Spinellis.
Explore, excogitate, exploit: Component mining ( http://www.spinellis.gr/pubs/jrnl/1999-Computer-Components/html/comp.html).
IEEE Computer, 32(9):114–116, September 1999.
- Diomidis Spinellis.
Code Reading: The Open
Source Perspective, pages 1–17.
Effective Software Development Series. Addison-Wesley, Boston, MA, 2003.
- Christopher J. Van Wyk
and Donald C. Lindsay.
Literate programming: A file difference program.
Communications of the ACM, 32(6):740–755, June 1989.
Exercises and Discussion Topics
- Are there examples of "open-source" communication in other
engineering disciplines?
- Discuss possible business models behind an open-source distribution.
- Download, build from source, and install
a small open-source program. (easy)
- Install an open-source operating system on your PC (e.g.
FreeBSD (http://www.freebsd.org),
Debian (http://www.debian.org),
NetBSD (http://www.netbsd.org)).
(moderate)
- Modify your system's kernel to log all accesses to files in the
/etc directory. (difficult)
The Open Source Landscape
History
- Initially software was tied to the hardware and free
- Organized collections: SHARE, ACM collected algorithms on tape
- 1970s Academic distribution of Unix in source code form
- 1983: Richard Stallman and the GNU Project
- Late 1980s: X Window System
- 1991: Linus Torvalds and the Linux kernel
- 1998: Netscape releases Mozilla; the Open-Source Movement
Free and "Free" Software
- Public Domain Software
- Open-Source Software
- Shared-Source Software
- Free Software
- Shareware
- Adware
- Spyware
The Open Source Definition
- Free redistribution
- Easy availability of source code
- Allow derived works
- Integrity of the author's source code
- No discrimination against persons or groups
- No discrimination against fields of endeavor
- Automatic distribution of license with the software
- License must not be specific to a product
- License must not restrict other software
Software Categories (FreeBSD Ports)
Top-40 package categories (by population) in the FreeBSD Ports
distribution.
Category | Number of Packages |
| devel | 3439 |
| www | 2061 |
| textproc | 1346 |
| net | 1215 |
| games | 1108 |
| graphics | 1029 |
| sysutils | 1028 |
| security | 907 |
| audio | 894 |
| databases | 809 |
| mail | 773 |
| misc | 578 |
| math | 569 |
| x11 | 486 |
| japanese | 403 |
| lang | 389 |
| distfiles | 361 |
| print | 360 |
| multimedia | 355 |
| x11-toolkits | 319 |
| deskutils | 308 |
| net-mgmt | 290 |
| editors | 271 |
| x11-themes | 216 |
| emulators | 202 |
| archivers | 196 |
| x11-wm | 187 |
| net-im | 186 |
| science | 170 |
| java | 165 |
| comms | 165 |
| x11-fonts | 155 |
| irc | 149 |
| dns | 148 |
| converters | 145 |
| chinese | 144 |
| net-p2p | 142 |
| ftp | 126 |
| astro | 122 |
| news | 105 |
cd /usr/ports
find * -prune -type d |
while read i
do
echo "$i `ls $i | wc -l`"
done |
sort -rn +1 |
head -40
System Software
Operating Systems
- The Linux kernel
- Linux distributions
- FreeBSD
- NetBSD
- OpenBSD
- DragonFly
- Mach
- Plan 9
- Hurd
- Microsoft Windows (if you are a government)
Databases
- mySQL
- PostgreSQL
- SQLite
- HSQLDB
Emulators
- Bochs X86 emulator
- Wine (Wine Is Not an Emulator)
- Samba
- Cygwin
Language Processors
- The GNU compiler collection
- Perl
- Python
- Ruby
- PHP
- Tcl/Tk
- Mono (.NET re-implementation)
- Shells
Graphics
Applications and Libraries
- Ghostscript
- PNG
- GD
- FreeType
- Gnuplot
- GraphViz
- GMT
- Gimp
- Inkscape
- VLC or Mplayer
Environments
- X Window System
- KDE
- Gnome
Development Tools
- GNU binary and text tools
- GNU Emacs
- vim
- jEdit
- Eclipse
- KDevelop
- Dev-C++
- JUnit
- Boost C++ Libraries
- dOxygen
- autoconf
- bison and flex
- make / ant
- Bugzilla
- git / Subversion / CVS
Text Processing
- TeX / LaTeX / MiKTeX
- Docbook
- groff
- aspell
- Libre Office / Apache OpenOffice / Calligra Suite
- Expat XML Parser
Web and Application Servers
- Apache Web Server
- Jakarta Tomcat
- JetSpeed
- JBoss
Desktop Applications
- Putty
- Xine
- Open Office
- Gnumeric
- Evolution (calendar, meeting, ...)
- Firefox (web browser)
- Thunderbird (email client)
- Pidgin (instant messaging)
Open Source Software Forges
Influence on Product Development
Advantages
- Many existing elements available for reuse (what)
- Flexible reuse granularity (how)
- Easier porting (where)
- Improve, fix, support
Problems
- Divergent evolution
- Code bloat
- Inadvertently deep dependencies
- Dependency management
- Licensing
- Quality control
Development Process Advantages
- Adoption of sophisticated development platforms and tools
- operating systems: GNU/Linux and FreeBSD,
- databases: PostgreSQL and MySQL,
- application servers: JBoss,
- optimizing compilers: gcc,
- integrated development environments: Eclipse KDevelop,
- build managers: make and ant,
- version control management systems: git, Subversion.
- Adoption of software engineering processes
- version control
- peer reviews
- issue tracking
- release engineering
- regression testing.
- Learn from the source
- Easier procurement
- Adherence to open standards
Development Process Problems
- Contribute to maintenance
- Forks: effort duplication and waste
- Integration of multiple developlemnt processes
- Less attention to
- strategic planning
- detailed requirement elicitation
- testing
- organized support.
License Distribution (Sourceforge)
License | Number |
GNU General Public License (GPL) | 35807 |
GNU Library or Lesser General Public License (LGPL) | 5447 |
BSD License | 3587 |
Artistic License | 1119 |
Apache Software License | 905 |
MIT License | 881 |
Mozilla Public License 1.1 (MPL 1.1) | 539 |
Common Public License | 265 |
Mozilla Public License 1.0 (MPL) | 260 |
zlib/libpng License | 245 |
Qt Public License (QPL) | 205 |
Open Software License | 184 |
Python License (CNRI Python License) | 159 |
Academic Free License (AFL) | 131 |
Python Software Foundation License | 80 |
IBM Public License | 70 |
PHP License | 49 |
Apple Public Source License | 44 |
Sun Industry Standards Source License (SISSL) | 43 |
Sun Public License | 38 |
Jabber Open Source License | 36 |
wxWindows Library Licence | 33 |
University of Illinois/NCSA Open Source License | 29 |
Zope Public License | 25 |
Nethack General Public License | 24 |
W3C License | 21 |
Intel Open Source License | 19 |
Open Group Test Suite License | 14 |
Sleepycat License | 13 |
Apache License V2.0 | 12 |
Eiffel Forum License | 10 |
Eiffel Forum License V2.0 | 10 |
Attribution Assurance License | 9 |
Reciprocal Public License | 7 |
Ricoh Source Code Public License | 6 |
Historical Permission Notice and Disclaimer | 6 |
Legal Exposure
- Proprietary code coexistsence with GPL
- Distribution of copyrights
- Assuming liability for all the code use
- Dynamic loading of code has is own licensing issues
- The BSD-like licenses allow for proprietary forks
- Packaging companies often offer value
Web sites
Further Reading
- Stephanos Androutsellis-Theotokis, Diomidis Spinellis,
Maria Kechagia, and Georgios Gousios.
Open
source software: A survey from 10,000 feet.
Foundations and Trends in Technology, Information and Operations
Management, 4(3–4):187–347, 2010.
(doi:10.1561/0200000026 (http://dx.doi.org/10.1561/0200000026))
- Michael A. Cusumano.
The Business of
Software: What Every Manager, Programmer, and Entrepreneur Must Know to
Thrive and Survive in Good Times and Bad.
The Free Press, New York, 2004.
- J. Feller and
B. Fitzgerald.
Understanding Open Source Software Development.
Addison-Wesley, Reading, MA, 2001.
- E. Von Hippel.
Innovation by user communities: Learning from open source software.
Sloan Management Review, 42(4):82–86, Summer 2001.
- Maria Kechagia,
Diomidis Spinellis, and Stephanos Androutsellis-Theotokis.
Open source licensing across
package dependencies.
In Costas Vassilakis and Nikolaos Tselikas, editors, PCI 2010:
Proceedings of 14th Panhelenic Conference on Informatics, pages
27–32, Los Alamitos, CA, USA, September 2010. IEEE Computer Society.
(doi:10.1109/PCI.2010.28 (http://dx.doi.org/10.1109/PCI.2010.28))
- Brian W. Kernighan
and P. J. Plauger.
Software
Tools.
Addison-Wesley, Reading, MA, 1976.
- John Lions.
Lions'
Commentary on Unix 6th Edition with Source Code.
Annabooks, Poway, CA, 1996.
- T. R. Madanmohan and
Rahul De'.
Open source reuse in commercial firms.
IEEE Software, 21(6):62–69, November/December 2004.
(doi:10.1109/MS.2004.45 (http://dx.doi.org/10.1109/MS.2004.45))
- David G.
Messerschmitt and Clemens Szyperski.
Software Ecosystem—Understanding An Indispensable Technology and
Industry.
MIT Press, Cambridge, 2003.
- Eric S. Raymond.
The Cathedral
and the Bazaar: Musings on Linux and Open Source by an Accidental
Revolutionary.
O' Reilly and Associates, Sebastopol, CA, 2001.
- Eric S. Raymond.
The Art Of Unix
Programming.
Addison-Wesley, 2003.
- J. Sandred.
Managing Open Source Projects.
John Wiley and Sons, New York, 2001.
- Diomidis
Spinellis and Clements Szyperski.
How
is open source affecting software development?.
IEEE Software, 21(1):28–33, January/February 2004.
- Diomidis Spinellis.
Open
source and professional advancement.
IEEE Software, 23(5):70–71, September/October 2006.
(doi:10.1109/MS.2006.136 (http://dx.doi.org/10.1109/MS.2006.136))
- Diomidis Spinellis.
Cracking
software reuse.
IEEE Software, 24(1):12–13, January/February 2007.
(doi:10.1109/MS.2007.9 (http://dx.doi.org/10.1109/MS.2007.9))
- Diomidis Spinellis.
Choosing
and using open source components.
IEEE Software, 28(3):96, 95, May/June 2011.
(doi:10.1109/MS.2011.54 (http://dx.doi.org/10.1109/MS.2011.54))
Exercises and Discussion Topics
- Examine which software categories are over or underepresented
in open source software repositories.
Discuss why this might be the case.
- What criteria will you use for determining the project
to contribute?
- Describe the business model behind a packaging company.
Is a similar business model used in another, non-software, area?
Why (not)?
- Consider adopting some of the programs we described to improve
your productivity.
- Learn a scripting language, like Ruby, Python, or Perl.
- Compile a Swiss-army-knife CD with all the open source software you
would want to have with you on a desert island.
The Full Open Source Definition
Introduction
Open source doesn't just mean access to the source code. The
distribution terms of open-source software must comply with the
following criteria:
1. Free Redistribution
The license shall not restrict any party from selling or giving away the
software as a component of an aggregate software distribution containing
programs from several different sources. The license shall not require a
royalty or other fee for such sale.
2. Source Code
The program must include source code, and must allow distribution in
source code as well as compiled form. Where some form of a product is
not distributed with source code, there must be a well-publicized
means of obtaining the source code for no more than a reasonable
reproduction cost preferably, downloading via the Internet without
charge. The source code must be the preferred form in which a
programmer would modify the program. Deliberately obfuscated source
code is not allowed. Intermediate forms such as the output of a
preprocessor or translator are not allowed.
3. Derived Works
The license must allow modifications and derived works, and must allow
them to be distributed under the same terms as the license of the original
software.
4. Integrity of The Author's Source Code
The license may restrict source-code from being distributed in modified
form only if the license allows the distribution of "patch files" with
the source code for the purpose of modifying the program at build time.
The license must explicitly permit distribution of software built from
modified source code. The license may require derived works to carry a
different name or version number from the original software.
5. No Discrimination Against Persons or Groups
The license must not discriminate against any person or group of persons.
6. No Discrimination Against Fields of Endeavor
The license must not restrict anyone from making use of the program in
a specific field of endeavor. For example, it may not restrict the program
from being used in a business, or from being used for genetic research.
7. Distribution of License
The rights attached to the program must apply to all to whom the program
is redistributed without the need for execution of an additional license
by those parties.
8. License Must Not Be Specific to a Product
The rights attached to the program must not depend on the program's being
part of a particular software distribution. If the program is extracted
from that distribution and used or distributed within the terms of the
program's license, all parties to whom the program is redistributed should
have the same rights as those that are granted in conjunction with the
original software distribution.
9. License Must Not Restrict Other Software
The license must not place restrictions on other software that is
distributed along with the licensed software. For example, the
license must not insist that all other programs distributed on the
same medium must be open-source software.
*10. License Must Be Technology-Neutral
No provision of the license may be predicated on any individual technology or style of interface.
Origins: Bruce Perens wrote the first draft of this document as
"The Debian Free Software Guidelines", and refined it using the comments of the
Debian developers in a month-long e-mail conference in June, 1997. He removed
the Debian-specific references from the document to create the "Open Source
Definition."
Copyright © 2004 by the Open Source Initiative (http://www.opensource.org)
Tackling Large Projects
Design and Implementation Techniques
- Visible software process
- Disciplined practices
- Programming style
- Organization
- Documentation
- Processes
- Nontrivial architecture
- Merciless decomposition
- Support for multiple platforms
- Libraries, components, processes
- Custom, domain-specific languages and tools
- Implementation style
- Agressive use of the C preprocessor
- Object-oriented techniques
- Operator overloading
FreeBSD 5.2 Release Policy
Subject: 5.2 is coming!!
Date: Wed, 7 Jan 2004 23:57:22 -0700 (MST)
From: Scott Long <scottl@FreeBSD.org>
To: developers@FreeBSD.org
All,
We've delayed quite bit while install problems and various show-stopper
bugs got fixed. I think that it's safe to say that we are getting
pretty close to closing the door on 5.2 now. There are still a few MFC
requests trickling in, so I wanted to publicize the policy that needs
to be followed for the remainder of the 5.2 cycle.
First, there can be no 'instant MFCs'. All 5.2 MFC candidates must spend
at least 36 hours in HEAD before going into RELENG_5_2. It's fine to
contact re@ after committing to HEAD to state your desire to also commit
to RELENG_5_2, but don't expect instant approval. I encourage the 36
hours to be used for as much review, testing, and verification as
possible.
Unless another show-stopper arises, I intend to tag the tree late
Friday night or early Saturday morning (MST -0700). In accordance
with the previous paragraph, this means that there are approximately
12 more hours for bug fixes to go in and be 5.2 candidates. This is
_not_ intended to be a signal for people to start rushing stuff in,
but rather a reminder that we cannot hold the release up for forever.
MFC approval is still at the discretion of the RE team. After this
cut-off, only show-stoppers will be considered for approval.
While 5.2 will certainly have bugs, I'm happy to hold the release up for
true show-stoppers. A show-stopper generally is a bug that causes a panic
[crash] during normal and reasonable operation of the OS or prevents the
OS from being installed in a way that is documented, and has no reasonable
work-around. These bugs must be confirmed by a third party to ensure
that it is a problem that is likely to be reproduced in the user base. It
also includes security vulnerablilities that are published by mainstream
security outlets and have not already been addressed in some fashion.
Thanks a lot!
Scott
Directory Structure
- src
- TLD (e.g. .org)
- main / program-name
- lib
- common
- include
- doc
- man
- arch / os
- .git / .svn / CVS
- build / compile / classes / obj / Release / Debug
- tools
- test
- conf
- etc
- eg
- bin
- contrib
Project Organization
Two approaches:
- Use a single structure for all parts (doc/named)
- Replicate directory structure for each part (named/doc)
Keep in mind: large structures are often very easy to navigate.
- 3000 directories.
- Source is related to the installation path.
Source Code is Not Only Code
- Specifications
- Design documentation (rarely)
- End user documentation
- Developer documentation
- Test scripts
- Multimedia resources
- Build tools
- Examples
- Localization files
- Revision history
- Installation procedures
- Licensing information
Key Processes
- Configuration management
- Building automation
- Issue tracking
- Tooling
Project-Specific Tools
- Used to automate the build process (X: imake, makedepend)
- Generate specialized code
- Handle regression tests
- Automatically generate documentation (javadoc, pod)
Building the IBM 3270 terminal emulator
Debug Messages
- Debug messages can help us understand the rest of the code
- Console debug messages:
#ifdef DEBUG
if (trace == NULL)
trace = fopen("bgtrace", "w");
fprintf(trace, "\nRoll: %d %d%s\n", D0, D1, race ? " (race)" : "");
fflush(trace);
#endif
- System log messages:
syslog(LOG_DEBUG, "Successful lookup: %d , %d : %s\n",
lport, fport, pwp->pw_name);
- Different debug levels:
if (debug > 4)
printf("systime: offset %s\n", lfptoa(now, 6));
Logging
Logging offers a number of advantages over using a debugger:
- Placing and output is program-specific
- Can perform data-structure traversals
printf(PDQ_OS_PREFIX "FDDI Port%s = %c (PMD = %s)",
PDQ_OS_PREFIX_ARGS,
rsp->status_chars_get.station_type == PDQ_STATION_TYPE_DAS ? "[A]" : "",
pdq_phy_types[rsp->status_chars_get.phy_type[0]],
pdq_pmd_types[rsp->status_chars_get.pmd_type[0] / 100]
[rsp->status_chars_get.pmd_type[0] % 100]);
- Stored with the program source code
- Careful formatting
printf("\n");
printf("%s %s trap:\n", isfatal? "fatal" : "handled",
user ? "user" : "kernel");
printf("\n");
printf(" trap entry = 0x%lx (%s)\n", entry, entryname);
printf(" a0 = 0x%lx\n", a0);
printf(" a1 = 0x%lx\n", a1);
printf(" a2 = 0x%lx\n", a2);
printf(" pc = 0x%lx\n", framep->tf_regs[FRAME_PC]);
printf(" ra = 0x%lx\n", framep->tf_regs[FRAME_RA]);
printf(" curproc = %p\n", curproc);
if (curproc != NULL)
printf(" pid = %d, comm = %s\n", curproc->p_pid,
curproc->p_comm);
printf("\n");
- Adjustable and filterable (syslog, log4j, Apple logging)
import org.apache.log4j.*;
Argo.log.info ("Loading modules from " + file);
Argo.log.warn("Could not instantiate module " + classname);
Argo.log.warn ("Class " + pluginType.getName() + " is not a core Argo pluggable type.");
- Useful when a debugger can't be used
Assertions
- Can be used to identify how the program should behave
- Are used to verify:
- Steps of an operation
- Algorithm input and output values
- Function parameters
- Flow of control
- Hardware properties
- Test results
Example (from regular expression engine)
if (dp != NULL)
break;
/* uh-oh... we couldn't find a subexpression-level match */
assert(g->backrefs); /* must be back references doing it */
assert(g->nplus == 0 || m->lastpos != NULL);
Assertions in Java
Example
/*
* Run With java -ea FindMax
*/
class FindMax {
/** Return the maximum number in non-empty array v */
public static int findMax(int v[]) {
int max = Integer.MIN_VALUE;
// Precondition: v[] is not empty
assert v.length > 0 : "v[] is empty";
// Precondition: max <= v[i] for every i
for (int i = 0; i < v.length; i++)
assert max <= v[i] : "Found value < MIN_VALUE";
// Locate the real maximum value
for (int i = 0; i < v.length; i++)
if (v[i] > max)
max = v[i];
// Postcondition: max >= v[i] for every i
for (int i = 0; i < v.length; i++)
assert max >= v[i] : "Found value > MIN_VALUE";
return max;
}
// Test harness
public static void main(String argv[]) {
int t[] = new int[5];
t[0] = 4;
t[1] = -4;
t[2] = 145;
t[3] = 0;
t[4] = Integer.MIN_VALUE;
System.out.println("Max value is " + findMax(t));
}
}
Further Reading
- Moshe Bar and Karl Franz
Fogel.
Open
Source Development with CVS.
The Coriolis Group, Scottsdale, AZ, 2001.
- Kent Beck and Erich
Gamma.
Test infected: Programmers love writing tests.
Java Report, 3(7):37–50, July 1998.
- Stephen P. Berczuk
and Brad Appleton.
Software Configuration Management Patterns: Effective Teamwork, Practical
Integration.
Addison-Wesley, Boston, MA, 2002.
- Don Bolinger, Tan
Bronson, and Mike Loukides.
Applying RCS and SCCS: From Source Control to Project Control.
O'Reilly and Associates, Sebastopol, CA, 1995.
- Per Cederqvist
et al.
Version Management with
CVS, 2001.
Available online http://www.cvshome.org/docs/manual/ (January 2002).
- Roger F. Crew.
ASTLOG: A
language for examining abstract syntax trees.
In Ramming [Ramming, 1997], pages 229–242.
- Rohan T. Douglas.
Error message management.
Dr. Dobb's Journal, 15(1):48–51, January 1990.
- Paul Dubois and
Gigi Estabrook.
Software Portability with Imake.
O'Reilly and Associates, Sebastopol, CA, second edition, 1996.
- Rickard E. Faith, Lars S.
Nyland, and Jan F. Prins.
KHEPERA: A
system for rapid implementation of domain specific languages.
In Ramming [Ramming, 1997], pages 243–255.
- Stuart I. Feldman.
Make—a program for maintaining computer programs.
Software: Practice & Experience, 9(4):255–265, 1979.
- Cem Kaner, Jack Falk, and
Hung Quoc Nguyen.
Testing Computer Software.
Wiley, New York, second edition, 1999.
- Nils Klarlund
and Michael I. Schwarzbach.
A
domain-specific language for regular sets of strings and trees.
In Ramming [Ramming, 1997], pages 145–156.
- Andrew Oram and Steve
Talbott.
Managing Projects with make.
O'Reilly and Associates, Sebastopol, CA, second edition, 1991.
- J. Christopher Ramming, editor.
USENIX Conference on Domain-Specific Languages, Berkeley, CA, October
1997. Usenix Association.
- M. J. Rochkind.
The source code control system.
IEEE Transactions on Software Engineering, SE-1(4):255–265,
1975.
- Diomidis
Spinellis and V. Guruprasad.
Lightweight languages as software engineering tools ( http://www.spinellis.gr/pubs/conf/1997-DSL-Lightweight/html/paper.html).
In Ramming [Ramming, 1997], pages 67–76.
- Diomidis Spinellis.
Implementing Haskell: Language implementation as a tool building
exercise.
Structured Programming (Software Concepts and Tools), 14:37–48,
1993.
- Diomidis Spinellis.
Reliable software implementation using domain specific languages ( http://www.spinellis.gr/pubs/conf/1999-ESREL-SoftRel/html/dsl.html).
In G. I. Schuëller and P. Kafka, editors, Proceedings ESREL '99 —
The Tenth European Conference on Safety and Reliability, pages
627–631, Rotterdam, September 1999. ESRA, VDI, TUM, A. A. Balkema.
- Diomidis Spinellis.
Notable design patterns for domain specific languages ( http://www.spinellis.gr/pubs/jrnl/2000-JSS-DSLPatterns/html/dslpat.html).
Journal of Systems and Software, 56(1):91–99, February 2001.
- Diomidis Spinellis.
Code Reading: The Open
Source Perspective, pages 179–224.
Effective Software Development Series. Addison-Wesley, Boston, MA, 2003.
- James M. Stichnoth
and Thomas Gross.
Code
composition as an implementation language for compilers.
In Ramming [Ramming, 1997], pages 119–132.
- Walter F. Tichy.
Design, implementation, and evaluation of a revision control system,.
In Proceedings of the 6th International Conference on Software
Engineering. IEEE, September 1982.
- Gary V. Vaughan, Ben
Elliston, Tom Tromey, and Ian Lance Taylor.
GNU
Autoconf, Automake, and Libtool.
New Riders Publishing, Indianapolis, IN, 2000.
Exercises and Discussion Topics
-
Propose ways to quickly determine whether a given project follows
one of the design or implementation techniques we described.
Test your proposal against one of the major projects available in
the course's reference source code .
-
Locate recommended design and implementation practices in a software
engineering book.
Explain how these are reflected in a project's source code.
-
How can a standardized directory structure be utilized for automating
aspects of the software development process?
-
Examine and describe the directory structure of an installed version
of Microsoft Windows.
-
Configure a program from the course's reference source code to compile and run in a
supported environment.
Examine the configuration process output files and
manually verify four different configuration options.
-
Read the source code of a configuration script,
explaining how two different configuration options
are determined.
-
How do you track revision histories in your environment?
If you are not using a revision control system, consider
adopting one.
-
How is the build process managed in your favorite
integrated development environment?
Discuss the advantages and shortcomings of the employed approach.
-
Locate two different test cases in the course's reference source code and explain how their
execution is, or could be, automated.
-
The Perl test cases suggest that
``A good test
case has most of these attributes: fewest possible
number of lines; few dependencies on external
commands, modules, or libraries; runs on most
platforms unimpeded; and is self-documenting.''
Locate test cases in the course's reference source code and judge them
against the above standard.
Version Control
Version Control
- Tracks project across time
- Collaboration, documentation, undo
- Open source systems in order of increasing capabilities
- RCS
- CVS
- Subversion, Git, Mercurial, Bazaar
The Great Debate
- Centralized (CVS, Subversion)
- Single source of truth
- Pre-commit checks
- Distributed (Bazaar, Git, Mercurial, darcs)
- Can work off line
- Easier participation (no committer bit)
- Cheap local commits
Version Control Operations
- Central repository keeps historical data
- A commit or check-in adds a change into the repository
- A pull or update or check-out
retrieves a version from the repository
- Versions are tracked with version numbers or hashes
- Numeric or hashes: automatically assigned
- Symbolic: assigned by humans to track milestones
- On some VCSs Keywords in the source files are automatically
expanded, identifying version, status, author, etc.
Branching
- Development can split into different branches e.g.
- Trunk or head
- Stable or release
- Feature
- Vendor
- Branches can join again
Revision Tree Example
cat.c revision tree
Life Under a VCS
- Import or clone or checkout
- Update or synchronize or pull
- Commit
- (Optional) push
- Label or tag
The Goodies
- Identify conflicting changes
- Versioning:
- Annotated listings
- Version history
- File differences
- Branching
- Source of truth
- Metrics
Example of a Change Log
RCS file: /cvsroot/basesrc/bin/cat/cat.c,v
Working file: cat.c
head: 1.28
branch:
locks: strict
access list:
symbolic names:
netbsd-1-5-PATCH002: 1.23
[...]
netbsd-1-5: 1.23.0.4
[...]
netbsd-1-2-BETA: 1.11
netbsd-1-2-base: 1.11
netbsd-1-2: 1.11.0.6
[...]
keyword substitution: kv
total revisions: 33; selected revisions: 33
description:
----------------------------
revision 1.28
date: 2001/09/16 12:12:13; author: wiz; state: Exp; lines: +6 -4
Some KNF, via patch by Petri Koistinen in private mail.
----------------------------
revision 1.27
date: 2001/07/29 22:40:57; author: wiz; state: Exp; lines: +6 -6
Some style improvements. [Nearly] #13592 by Petri Koistinen.
[...]
----------------------------
revision 1.15.2.1
date: 1998/02/08 21:45:36; author: mellon; state: Exp; lines: +18 -9
Pull up 1.16 and 1.17 (kleink)
Example of a File Difference List
$ cvs diff -c -r1.12 -r1.13 basesrc/bin/cat/cat.c
Index: basesrc/bin/cat/cat.c
===================================================================
RCS file: /cvsroot/basesrc/bin/cat/cat.c,v
retrieving revision 1.12
retrieving revision 1.13
diff -c -r1.12 -r1.13
[...]
***************
*** 136,141 ****
--- 136,142 ----
fp = stdin;
else if ((fp = fopen(*argv, "r")) == NULL) {
warn("%s", *argv);
+ rval = 1;
++argv;
continue;
}
Example of an Annotated Listing
1.166 (jhb 16-Oct-02): fdp = p->p_fd;
1.114 (alfred 13-Jan-02): FILEDESC_LOCK(fdp);
1.157 (iedowse 02-Sep-02): if ((unsigned)fd >= fdp->fd_nfiles ||
1.157 (iedowse 02-Sep-02): (fp = fdp->fd_ofiles[fd]) == NULL) {
1.114 (alfred 13-Jan-02): FILEDESC_UNLOCK(fdp);
1.106 (dillon 01-Sep-01): error = EBADF;
1.106 (dillon 01-Sep-01): goto done2;
1.106 (dillon 01-Sep-01): }
1.157 (iedowse 02-Sep-02): pop = &fdp->fd_ofileflags[fd];
1.94 (dillon 18-Nov-00):
1.157 (iedowse 02-Sep-02): switch (cmd) {
1.1 (rgrimes 24-May-94): case F_DUPFD:
1.241 (rwatson 07-Aug-04): /* mtx_assert(&Giant, MA_NOTOWNED); */
1.158 (jhb 03-Sep-02): FILEDESC_UNLOCK(fdp);
Extracting Metrics
- Changes per source file
- Changes per author
- Authors per source file
- Lines per day
- Changes corresponding to bug categories
Tracking Example
Maintainability index over time in the FreeBSD kernel.
Best Practices
- Put everything under version control
- Use VCS on your personal projects
- Think carefully about file name and organization
- Perform one separate commit for every change
- Label all releases
- Establish and follow policies and procedures
Subversion Cheat Sheet
(Results appear after the # sign.)
Create Repository
pwd
# /home/dds
svnadmin create repo
chmod -R go+rwX repo/
Create Project Structure
mkdir template
cd template
mkdir myproject
cd myproject/
mkdir trunk tags branches
cd ..
pwd
# /home/dds/template
ls
# myproject
Initial Addition of a Project to the Repository
pwd
# /home/dds/template
svn import . file:///home/dds/repo/ --message 'Initial repository layout'
# Adding myproject
# Adding myproject/trunk
# Adding myproject/branches
# Adding myproject/tags
#
# Committed revision 1.
cd ..
Initial Checkout from the Repository
pwd
# /home/dds/
mkdir work
cd work
svn co file:///home/dds/repo/myproject
# A myproject/trunk
# A myproject/branches
# A myproject/tags
# Checked out revision 1.
ls
# myproject
Add a File to the Project
pwd
# /home/dds/work
cd myproject/trunk
echo hi >file.txt
svn add file.txt
# A file.txt
svn commit --message 'Added file.txt'
# Adding trunk/file.txt
# Transmitting file data .
# Committed revision 2.
Tag a Release and Create a Branch
svn copy file:///home/dds/repo/myproject/trunk file:///home/dds/repo/myproject/tags/version-1.0 --message 'Tag version 1.0'
# Committed revision 3.
svn copy file:///home/dds/repo/myproject/trunk file:///home/dds/repo/myproject/branches/bug-fix-r1.0 --message 'Branch version 1.0 for bug fixes'
# Committed revision 4.
Update with Changes
cd ..
pwd
# /home/dds/work/myproject
svn up
#A branches/bug-fix-r1.0
#A tags/version-1.0
#Updated to revision 4.
Commit a Change
echo hello >file.txt
svn up
# At revision 5.
svn commit -m 'Made greeting more formal'
#Sending trunk/file.txt
#Transmitting file data .
#Committed revision 5.
Git Cheat Sheet
(Results appear after the # sign.)
Create Repository
pwd
# /home/dds
git init repo
chmod -R go+rwX repo/
Create Project
cd repo
mkdir myproject
cd myproject
pwd
# /home/dds/repo/myproject
Add a File to the Project
pwd
# /home/dds/repo
cd myproject
echo "My project repo." >README
git add README (or git add . to include every file present.)
git status
# On branch master
#
# Initial commit
#
# Changes to be committed:
# (use "git rm --cached <file>..." to unstage)
#
# new file: README
#
git commit -m 'README file for repo.'
#[master (root-commit) 138f654] README file for repo.
#1 files changed, 1 insertions(+), 0 deletions(-)
#create mode 100644 README
echo "TODO: configure the repo in $GIT_DIR/config." >>README
git status
# On branch master
# Changes not staged for commit:
# (use "git add <file>..." to update what will be committed)
# (use "git checkout -- <file>..." to discard changes in
# working directory)
#
# modified: README
#
# no changes added to commit (use "git add" and/or "git commit -a")
git commit -a -m 'Updated README file.'
#[master 1376204] Updated README file.
#1 files changed, 1 insertions(+), 0 deletions(-)
Create a Branch
git branch
#* master
git branch bug-fixing
git branch -a
# bug-fixing
#* master
Initial Checkout from the Repository
pwd
# /home/dds/
mkdir work
cd work
git clone file:///home/dds/repo
#Cloning into repo...
#remote: Counting objects: 306, done.
#remote: Compressing objects: 100% (92/92), done.
#remote: Total 306 (delta 214), reused 306 (delta 214)
#Receiving objects: 100% (306/306), 213.91 KiB, done.
#Resolving deltas: 100% (214/214), done.
ls
# repo
Update with Changes
pwd
#/home/dds/work/repo/myproject
git pull
#Already up-to-date.
Work with Branches
pwd
#/home/dds/work/repo/myproject
git checkout bug-fixing
#Branch bug-fixing set up to track remote branch bug-fixing from origin.
#Switched to a new branch 'bug-fixing'.
Commit a Change
pwd
#/home/dds/work/repo/myproject
git branch
#* bug-fixing
echo "TODO: Improve layout in README." >>README
git commit -a -m 'New TODO in README.'
#[bug-fixing d758c7f] New TODO in README.
# 1 files changed, 1 insertions(+), 0 deletions(-)
git push
#Counting objects: 5, done.
#Delta compression using up to 2 threads.
#Compressing objects: 100% (2/2), done.
#Writing objects: 100% (3/3), 324 bytes, done.
#Total 3 (delta 0), reused 0 (delta 0)
#Unpacking objects: 100% (3/3), done.
#To file:///home/dds/repo/
#c6439c5..d758c7f bug-fixing -> bug-fixing
Tag a Release
pwd
#/home/dds/work/repo/myproject
git tag "v1.1"
git push --tags
#Total 0 (delta 0), reused 0 (delta 0)
#To file:///home/dds/repo/
# * [new tag] v1.1 -> v1.1
More Resources
Further Reading
- Bryan O'Sullivan.
Making sense of revision-control systems.
Commun. ACM, 52(9):56–62, 2009.
(doi:10.1145/1562164.1562183 (http://dx.doi.org/10.1145/1562164.1562183))
- M. J. Rochkind.
The source code control system.
IEEE Transactions on Software Engineering, SE-1(4):255–265,
1975.
- Diomidis Spinellis.
Software
engineering glossary, version control, part 2.
IEEE Software, 22(6):c2–c3, November/December 2005.
(doi:10.1109/MS.2005.169 (http://dx.doi.org/10.1109/MS.2005.169))
- Diomidis Spinellis.
Software
engineering glossary, version control, part I.
IEEE Software, 22(5):107, September/October 2005.
(doi:10.1109/MS.2005.141 (http://dx.doi.org/10.1109/MS.2005.141))
- Diomidis Spinellis.
Version
control systems.
IEEE Software, 22(5):108–109, September/October 2005.
(doi:10.1109/MS.2005.140 (http://dx.doi.org/10.1109/MS.2005.140))
- Walter F. Tichy.
Design, implementation, and evaluation of a revision control system,.
In Proceedings of the 6th International Conference on Software
Engineering. IEEE, September 1982.
Build Management
The Build Process
Typical Project Dependencies
Example: Apache Project Dependencies
Example: Mozilla Dependencies
Example: Internet Explorer Dependencies
Tool Types
- Low level: Make, Rake, Ant
- Abstraction based: CMake, Autotools
- Framework-driven: Maven
Makefiles
- Tools like ant and make allow the automatic processing of dependencies
- Dependencies are specified in a special file
- Elements of a Makefile may include
- Variable definitions
- Rules for suffixes
- Dependencies
- Target build instructions
- Pseudo-targets
- Auto-generated dependencies
- Procedural specifications
Part of an Apache Makefile
OBJS= \
modules.o \
$(MODULES) \
main/libmain.a \
$(OSDIR)/libos.a \
ap/libap.a
.c.o:
$(CC) -c $(INCLUDES) $(CFLAGS) $<
.SUFFIXES: .def
.def.a:
emximp -o $@ $<
$(TARGET): $(EXTRA_DEPS) $(SUBTARGET)
lib$(TARGET).$(SHLIB_SUFFIX_NAME): subdirs modules.o
$(CC) -c $(INCLUDES) $(CFLAGS) buildmark.c
[...]
target_static: subdirs modules.o
$(CC) -c $(INCLUDES) $(CFLAGS) buildmark.c
$(CC) $(CFLAGS) $(LDFLAGS) $(LDFLAGS_SHLIB_EXPORT) \
-o $(TARGET) buildmark.o $(OBJS) $(REGLIB) $(EXPATLIB) $(LIBS)
clean:
-rm -f $(TARGET) lib$(TARGET).* *.o
@for i in $(SUBDIRS); do \
echo "===> $(SDP)$$i"; \
( cd $$i && $(MAKE) $(MFLAGS_STATIC) SDP='$(SDP)' $@ ) || exit 1; \
echo "<=== $(SDP)$$i"; \
done
#Dependencies
$(OBJS): Makefile subdirs
# DO NOT REMOVE
buildmark.o: buildmark.c include/ap_config.h include/ap_mmn.h \
include/ap_config_auto.h os/unix/os.h include/ap_ctype.h \
include/hsregex.h include/httpd.h include/ap_alloc.h include/buff.h \
include/ap.h include/util_uri.h
modules.o: modules.c include/httpd.h include/ap_config.h \
include/ap_mmn.h include/ap_config_auto.h os/unix/os.h \
include/ap_ctype.h include/hsregex.h include/ap_alloc.h include/buff.h \
include/ap.h include/util_uri.h include/http_config.h
Make Problems and Solutions
- Complexity
-
- Dry run with make -n
- Debug mode: print variables
- Before lunch: make -k
- Portability
-
The Ant Approach
- Build instructions written in XML
- File typically named
build.xml
- Variable definitions (named properties)
- Targets and dependencies
- By default, the target specified in the default attribute of the project tag is built
- Tasks are part of ant, typically not external programs
Some ant Tasks
Examples of ant built-in tasks:
- mkdir
- copy
- cvs
- delete
- javac
- jar
Example Ant Build File
<project name="Jasper" default="deploy" basedir=".">
<!-- ===================== Initialize Property Values =================== -->
<!-- See "build.properties.sample" in the top level directory for all -->
<!-- property values you must customize for successful building!!! -->
<property file="build.properties"/>
<property file="../build.properties"/>
<property file="${user.home}/build.properties"/>
<!-- Build Defaults -->
<property name="build.compiler" value="classic"/>
<property name="copy.crimson.jar" value="../lib/crimson.jar"/>
<!-- =================== BUILD: Create Directories ====================== -->
<target name="build-prepare">
<available classname="junit.framework.TestCase" property="junit.present" />
<mkdir dir="${jasper.build}"/>
<mkdir dir="${jasper.build}/jasper"/>
<mkdir dir="${jasper.build}/lib"/>
</target>
<!-- =================== BUILD: Copy Static Files ======================= -->
<target name="build-static" depends="build-prepare">
<!-- Executable Commands -->
<copy todir="${jasper.build}/bin">
<fileset dir="src/bin" />
</copy>
<fixcrlf srcdir="${jasper.build}/bin" includes="*.sh" eol="lf"/>
<fixcrlf srcdir="${jasper.build}/bin" includes="*.bat" eol="crlf"/>
<chmod perm="+x" file="${jasper.build}/bin/jasper.sh"/>
<chmod perm="+x" file="${jasper.build}/bin/jspc.sh"/>
<!-- Common Extensions -->
<copy todir="${jasper.build}/jasper" file="${copy.crimson.jar}"/>
<copy todir="${jasper.build}/jasper" file="${copy.jaxp.jar}"/>
</target>
<!-- ================= BUILD: Compile Server Components ================= -->
<target name="build-main" depends="build-static">
<!-- Compile internal server components -->
<javac srcdir="src/share" destdir="${jasper.build}/classes"
debug="${compile.debug}" deprecation="${compile.deprecation}"
optimize="${compile.optimize}"
excludes="**/CVS/**">
<classpath refid="jasper.classpath" />
</javac>
</project>
The Maven Approach
- Build instructions written in XML
- Project Object Model (POM) based on convention over configuration
- File typically named
pom.xml
- Will automatically fetch dependencies
- Building happens through pluggins with sensible defaults
- A standard lifecycle directs the execution of plugins
Maven's Lifecycle
- validate: validate the project is correct and all necessary information is available.
- initialize: initialize build state, e.g. set properties or create directories.
- generate-sources: generate any source code for inclusion in compilation.
- process-sources: process the source code, for example to filter any values.
- generate-resources: generate resources for inclusion in the package.
- process-resources: copy and process the resources into the destination directory, ready for packaging.
- compile: compile the source code of the project.
- process-classes: post-process the generated files from compilation, for example to do bytecode enhancement on Java classes.
- generate-test-sources: generate any test source code for inclusion in compilation.
- process-test-sources: process the test source code, for example to filter any values.
- generate-test-resources: create resources for testing.
- process-test-resources: copy and process the resources into the test destination directory.
- test-compile: compile the test source code into the test destination directory
- process-test-classes: post-process the generated files from test compilation, for example to do bytecode enhancement on Java classes. For Maven 2.0.5 and above.
- test: run tests using a suitable unit testing framework. These tests should not require the code be packaged or deployed.
- prepare-package: perform any operations necessary to prepare a package before the actual packaging. This often results in an unpacked, processed version of the package. (Maven 2.1 and above)
- package: take the compiled code and package it in its distributable format, such as a JAR.
- pre-integration-test: perform actions required before integration tests are executed. This may involve things such as setting up the required environment.
- integration-test: process and deploy the package if necessary into an environment where integration tests can be run.
- post-integration-test: perform actions required after integration tests have been executed. This may including cleaning up the environment.
- verify: run any checks to verify the package is valid and meets quality criteria.
- install: install the package into the local repository, for use as a dependency in other projects locally.
- deploy: done in an integration or release environment, copies the final package to the remote repository for sharing with other developers and projects.
(Source: Apache Maven Lifecycle Reference (http://maven.apache.org/guides/introduction/introduction-to-the-lifecycle.html#Lifecycle_Reference).)
Automating System Configuration
- System configuration: installed packages, settings, running services
- Specify using tools like: Chef, Puppet, cfengine
- Executed specification brings system up to date
- No changes are performed by hand
Best Practices
- Put all the project's elements under the build system
- Minimize work through dependency analysis
- Modularize the build into several targets
- Common include files
- Explore parallel builds to reduce build times
- Provide a way to build from scratch
- Create a tinderbox build
Best Practices (Variables)
Use variables for varying elements:
- Directories
- Tool options
- Web sites
- Version strings
- OS dependencies
Further Reading
- Mark Burgess and Ricky Ralston.
Distributed resource administration using Cfengine.
Softw., Pract. Exper., 27(9):1083–1101, 1997.
- Sun Microsystems, Inc., Santa
Clara, CA.
Sun Studio 12:
Distributed Make (dmake), 2007.
Part No: 819-5273. Available online
http://docs.sun.com/app/docs/doc/819-5273 (http://docs.sun.com/app/docs/doc/819-5273). Accessed 2009-03-13.
- Stuart I. Feldman.
Make—a program for maintaining computer programs.
Software: Practice & Experience, 9(4):255–265, 1979.
- S. McIntosh,
B. Adams, and A.E. Hassan.
The evolution of ANT build systems.
In Mining Software Repositories (MSR), 2010 7th IEEE Working Conference
on, pages 42–51, may 2010.
(doi:10.1109/MSR.2010.5463341 (http://dx.doi.org/10.1109/MSR.2010.5463341))
- Diomidis Spinellis.
Software
builders.
IEEE Software, 25(3):22–23, May/June 2008.
(doi:10.1109/MS.2008.74 (http://dx.doi.org/10.1109/MS.2008.74))
- Diomidis Spinellis.
Don't
install software by hand.
IEEE Software, 29(4):86–87, July/August 2012.
(doi:10.1109/MS.2012.85 (http://dx.doi.org/10.1109/MS.2012.85))
- Jesse Tilly and
Eric M. Burke.
Ant: The
Definitive Guide.
O'Reilly Media, 2002.
Code-Reading Tools
Typical Tasks
- Identify the declarations
- Locate definitions
- Go through the places where an entity is used.
- List deviations from coding standards.
- Discover related code
- Find related comments
- Check for common errors.
- View the code structure.
- Understand how the code interacts with its environment
Regular Expressions
- A regular expression allows the declarative specification of
complex strings.
- Regular expressions consist of:
- letters (that represent themselves) and
- special symbols
- Backslash escapes special symbols
Regular Expression Symbols
^ | Beginning of a line |
$ | End of a line |
. | Any character |
Expression? | The expression zero or one times |
Expression* | The expression zero or more times |
Expression+ | The expression one or more times |
Expression{n} | The expression n times |
Expression{n,} | The expression at least n times |
Expression{n,m} | The expression at least n but no more than m times |
Expression1|Expression2 | The expression1 or the expression2 |
(Expression) | The expression within the brackets |
\1 \2 ... \n | The content of the nth bracket |
Character Classes
[abc] | One of a, b, or c |
[a-z] | A letter from a to z |
[^abc] | Any letter appart from a, b, and c. |
\t | The character tab |
\n | The character newline |
\r | The character carriage return |
\a | The character alert |
\f | The character form feed |
\e | The character escape |
\cx | The character control-x (a-z) |
\d | Digit |
\D | Non-digit |
\s | Space |
\S | Non-space |
The Editor as a Code Browser
- Regular expression searches
- Tag facility
- Outlining and folding
- Bird's eye view with a 10% zoom factor
Code Searching with grep
- Simple wildcard search
- Search for definitions
- Using a word's root
- Locating files: grep on the output of ls
- File list: grep -l
- Use the result of a file list vi `grep -l ...`
- Stream edit the grep result to create scripts
- Postprocess with grep (-v) to eliminate noise (malloc / xmalloc)
- Use sort -u to eliminate duplicates
- Use fgrep to search for a fixed list of strings
- Use find and xargs to obtain filename lists
Locating File Differences
- Uses:
- Differences between versions
- Examining code modifications
- Verifying test results
- The diff program
- Obtaining a context diff (-c)
- Ignoring blanks (-w)
- Visual alternatives
Roll Your Own Tool: Implementation Options
- The Unix shell and tools (sed, awk, grep)
- Scripting language (Groovy, Perl, Python, Ruby)
- Visual Basic
- C, C++, Java
Choosing an Implementation
- A filtering pipeline (e.g. CVS logs): the Unix shell
- Record processing: awk
- Simple line-oriented string processing: sed
- Combination of tasks: scripting language
- Web front end: PHP, SWILL, GWT, Ruby on Rails
- Tight integration with Microsoft products: Visual Basic
- Extreme performance requirements: C, C++, Java
A Simple Grep Program in Java ...
/*
* Globally match regular expression and print
* Modelled after the Unix command with the same name
* D. Spinellis
*/
import java.util.regex.*;
import java.io.*;
class Grep {
public static void main(String args[]) {
if (args.length != 2) {
System.err.println("Usage: Grep pattern file");
System.exit(1);
}
Pattern cre = null; // Compiled RE
try {
cre = Pattern.compile(args[0]);
} catch (PatternSyntaxException e) {
System.err.println("Invalid RE syntax: " + e.getDescription());
System.exit(1);
}
BufferedReader in = null;
try {
in = new BufferedReader(new InputStreamReader(
new FileInputStream(args[1])));
} catch (FileNotFoundException e) {
System.err.println("Unable to open file " +
args[1] + ": " + e.getMessage());
System.exit(1);
}
try {
String s;
while ((s = in.readLine()) != null) {
Matcher m = cre.matcher(s);
if (m.find())
System.out.println(s);
}
} catch (Exception e) {
System.err.println("Error reading line: " + e.getMessage());
System.exit(1);
}
}
}
... And its Equivalent in Perl
#!/usr/bin/perl -n
BEGIN {$pat = shift;}
print if (/$pat/);
Tool Building Advice
- Exploit the capabilities of modern rapid-prototyping languages.
- Start with a simple design, gradually improving it as needed.
- Use heuristics that exploit the lexical structure of the code.
- Tolerate some output noise or silence
- Use other tools to preprocess your input or postprocess your output.
Example: Signature Survey
Using the Compiler
- Generate warning messages.
- Tweak the code to generate error messages.
- Generate program listings.
- Obtain the preprocessor output.
- Examine the generated symbolic (assembly) code.
- Work through the final object code
Compiler Warning Messages
- Expressions associated with portability problems
- Implicit type casts and conversions between different but compatible types
- The use of a constant in place of a conditional expression in if or while statements
- Functions that do not return a value although they should
- Variable declarations that mask other variables with the same name
- Missing enumeration members or extraneous values in a switch statement
- Unknown #pragma options
- Unreferenced variables, structure members, or labels
- Failing to initialize a const declared object
(Depending on the language, some of the above may be errors)
The Compiler as a Code-Reading Tool
- Change the name of a definition and read the errors
- Use intermediate files (preprocessing, symbolic code)
- Read symbolic code to
- To understand how an implementation-defined operation is actually performed
- To see how compiler optimizations affect the resulting code
- To convince yourself that the compiler correctly translates a piece of code, or (rarely) to find a compiler bug
- To verify that hardware-specific code written in C actually performs the operations you expect
- To learn how a particular construct can be written in symbolic code
- Examine the object code using nm (Unix) or dumpbin (Windows)
Code Browsers
Code browsers typicall offer the following facilities:
- Definitions: Locate where an entity is defined.
- References: Go through all places where an entity is used.
- A call graph: List the functions called by a given function.
- A caller graph: List the functions calling a given function.
- A file outline: List the functions, variables, types, and macros defined in a file.
Code Browsers in OO
In OO languages given a class you can find:
- Where the class is defined
- The locations where the class is referenced
- The classes derived from this class
- The base classes for this class
- Its public, protected, and private methods and fields
Beautifiers
- Fix code that is written truly inconsistently without following any formatting standards
- Adopt orphaned code for maintenance
- Create a temporary version of the code to help you decipher it
- Integrate code under the common roof of a larger project
Other related tools:
- Pretty-printers (e.g. vgrind)
- cdecl
Runtime Tools
- Tracing (strace, truss, apispy)
- Function profiling (gprof)
- Basic block counting
- Debuggers
- Step-by-step program execution
- Code breakpoints
- Data breakpoints
- Variable displays
- Stack dump
- Structure browsing
Drawing Diagrams
- Call graphs
- Class hierarchies
- Concrete object instances
- Data structures (trees, lists, and so on)
- Bit fields and corresponding masks in bit-mapped registers
- State transition diagrams
- Strings or arrays and corresponding indices or pointers
- Entity-relationship diagrams
Lab Tasks (Java)
- Locate the source code of ant
- Download it
- Examine the directory structure
- Browse the documentation
- Look for build instructions
- Compile
- Create a list of classes
- Display the help message
- Locate the help message in the source code
- Create a backup of the file (CVS/RCS)
- Modify the help message
- Compile and test
- View the file differences
- Modify the name of a method
Lab Tasks (Unix)
- make 1 (compile / run ls)
- make 2 (morning dressing)
- svn (Tick tack toe)
- cvs (anoncvs FreeBSD; log, diff)
- find, xargs, grep, awk, sort (number of public fields per Java class)
- regex 1 (grep /usr/dict/words)
- regex 2 (vi in source files)
- indent (destroy / fix a program; IOCCC)
- diff (from indent)
- sed / Perl (signature analysis)
- strace (echo, cat, ls)
- gdb (single-step jot)
- gprof
Further Reading
- Even Adams and
Steven S. Muchnick.
Dbxtool: A window-based symbolic debugger for Sun workstations.
Software: Practice & Experience, 16(7):653–669, July 1986.
- A. V. Aho and M. J.
Corasick.
Efficient string matching: an aid to bibliographic search.
Communications of the ACM, 18(6):333–340, 1975.
- Alfred V. Aho, John E.
Hopcroft, and Jeffrey D. Ullman.
The
Design and Analysis of Computer Algorithms.
Addison-Wesley, Reading, MA, 1974.
- Alfred V. Aho, Ravi Sethi,
and Jeffrey D. Ullman.
Compilers, Principles, Techniques, and Tools.
Addison-Wesley, Reading, MA, 1985.
- Alfred V. Aho, Brian W.
Kernighan, and Peter J. Weinberger.
The
AWK Programming Language.
Addison-Wesley, Reading, MA, 1988.
- Kent Beck.
Extreme Programming Explained: Embrace Change.
Addison-Wesley, Boston, MA, 2000.
- Bruce Blinn.
Portable Shell Programming: An Extensive Collection of Bourne Shell
Examples.
Prentice Hall, Englewood Cliffs, NJ, 1995.
- Grady Booch, James
Rumbaugh, and Ivar Jacobson.
The
Unified Modeling Language User Guide.
Addison-Wesley, Reading, MA, 1999.
- S. R. Bourne.
An introduction to the UNIX shell.
In Unix Programmer's Manual [Unix Programmer's Manual, 1979].
Also available online http://plan9.bell-labs.com/7thEdMan/.
- J. Brady.
A theory of productivity in the creative process.
IEEE Computer Graphics and Applications, 6(5):25–34, May
1986.
- Tom
Christiansen and Nathan Torkington.
The
Perl Cookbook.
O'Reilly and Associates, Sebastopol, CA, 1998.
- Ward Cunningham.
Signature survey: A method for
browsing unfamiliar code.
Available online http://c2.com/doc/SignatureSurvey/ (July 2002), 2001.
OOPSLA 2001 Software Archeology Workshop position statement.
- Tom DeMarco and
Timothy R. Lister.
Peopleware: Productive Projects and Teams.
Dorset House Publishing, New York, second edition, 1999.
- Nachum
Dershowitz and Edward M. Reingold.
Calendrical Calculations.
Cambridge University Press, Cambridge, 1997.
- Dale Dougherty
and Arnold Robbins.
sed and awk.
O'Reilly and Associates, Sebastopol, CA, second edition, 1997.
- Martin Fowler and
Kendall Scott.
UML Distilled: Applying the Standard Object Modeling Language.
Addison-Wesley, Boston, MA, second edition, 2000.
- Jeffrey E. Friedl and
Andy Oram, editors.
Mastering Regular Expressions: Powerful Techniques for Perl and Other
Tools.
O'Reilly and Associates, Sebastopol, CA, second edition, 2002.
- S. Graham, P. Kessler,
and M. K. McKusick.
gprof: A call graph execution profiler.
ACM SIGPLAN Notices, 6(17):120–126, June 1982.
Proceedings of the SIGPLAN '82 Symposium on Compiler Construction.
- S. Graham, P. Kessler,
and M. K. McKusick.
An execution profiler for modular programs.
Software: Practice & Experience, 13:671–685, 1983.
- Mark R. Horton.
Portable C Software.
Prentice-Hall, Englewood Cliffs, NJ, 1990.
- Stephen C. Johnson.
Lint, a C program checker.
Computer Science Technical Report 65, Bell Laboratories, Murray Hill, NJ,
December 1977.
- Stephen C. Johnson.
Lint, a C program checker.
In Unix Programmer's Manual [Unix Programmer's Manual, 1979].
Also available online http://plan9.bell-labs.com/7thEdMan/.
- Brian W. Kernighan
and Rob Pike.
The
UNIX Programming Environment.
Prentice-Hall, Englewood Cliffs, NJ, 1984.
- Andrew Koenig.
C
Traps and Pitfalls.
Addison-Wesley, Reading, MA, 1988.
- David G. Korn.
Porting Unix to Windows NT (http://www.usenix.org/publications/library/proceedings/ana97/korn.html).
In Proceedings of the USENIX 1997 Annual Technical Conference,
Berkeley, CA, January 1997. Usenix Association.
- Don Libes.
Obfuscated C and Other Mysteries.
John Wiley and Sons, New York, 1993.
- Mark Lutz.
Programming Python.
O'Reilly and Associates, Sebastopol, CA, second edition, 2002.
- Udi Manber and Sun Wu.
GLIMPSE: A tool to search through entire file systems.
In Jeff Mogul, editor, USENIX Conference Proceedings, pages
23–32, Berkeley, CA, Winter 1994. Usenix Association.
- L. E. McMahon.
SED—a non-interactive text editor.
In Unix Programmer's Manual [Unix Programmer's Manual, 1979].
Also available online http://plan9.bell-labs.com/7thEdMan/.
- Cameron Newham and
Bill Rosenblatt.
Learning the Bash Shell.
O'Reilly and Associates, Sebastopol, CA, second edition, 1998.
- Geoffrey J. Noer.
Cygwin32: A free Win32 porting layer for UNIX applications (http://www.usenix.org/publications/library/proceedings/usenix-nt98/noer.html).
In Susan Owicki and Thorsten von Eicken, editors, Proceedings of the 2nd
USENIX Windows NT Symposium, Berkeley, CA, August 1998. Usenix
Association.
- Paul W. Oman and Curtis R.
Cook.
Typographic style is more than cosmetic.
Communications of the ACM, 33(5):506–520, May 1990.
- Henry Rabinowitz
and Chaim Schaap.
Portable C.
Prentice Hall, Englewood Cliffs, NJ, 1990.
- James Rumbaugh, Ivar
Jacobson, and Grady Booch.
The
Unified Modeling Language Reference Manual.
Addison-Wesley, Reading, MA, 1999.
- Randal L. Schwartz, Tom
Christiansen, and Larry Wall.
Learning Perl.
O'Reilly and Associates, Sebastopol, CA, third edition, 2001.
- Diomidis Spinellis.
Outwit: Unix tool-based programming meets the Windows world ( http://www.spinellis.gr/pubs/conf/2000-Usenix-outwit/html/utool.html).
In Christopher Small, editor, USENIX 2000 Technical Conference
Proceedings, pages 149–158, Berkeley, CA, June 2000. Usenix
Association.
- Diomidis Spinellis.
Notable design patterns for domain specific languages ( http://www.spinellis.gr/pubs/jrnl/2000-JSS-DSLPatterns/html/dslpat.html).
Journal of Systems and Software, 56(1):91–99, February 2001.
- Diomidis Spinellis.
Code Reading: The Open
Source Perspective, pages 339–376.
Effective Software Development Series. Addison-Wesley, Boston, MA, 2003.
- Diomidis Spinellis.
Tool writing: A forgotten art? ( http://www.spinellis.gr/pubs/jrnl/2005-IEEESW-TotT/html/v22n4.html).
IEEE Software, 22(4):9–11, July/August 2005.
(doi:10.1109/MS.2005.111 (http://dx.doi.org/10.1109/MS.2005.111))
- Diomidis Spinellis.
The tools at hand ( http://www.spinellis.gr/pubs/jrnl/2005-IEEESW-TotT/html/v22n1.html).
IEEE Software, 22(1):10–13, January/February 2005.
(doi:10.1109/MS.2005.23 (http://dx.doi.org/10.1109/MS.2005.23))
- Sriram Srinivasan.
Advanced Perl Programming.
O'Reilly and Associates, Sebastopol, CA, 1997.
- Edward R. Tufte.
The
Visual Display of Quantitative Information.
Graphics Press, Cheshire, CT, 1983.
- UNIX
Programmer's Manual. Volume 2—Supplementary Documents.
Bell Telephone Laboratories, Murray Hill, NJ, seventh edition, 1979.
Also available online http://plan9.bell-labs.com/7thEdMan/.
- Larry Wall, Tom
Christiansen, Randal L. Schwartz, and Stephen Potter.
Programming Perl.
O'Reilly and Associates, Sebastopol, CA, third edition, 2000.
- Stephen R. Walli.
OPENNT: UNIX application portability to Windows NT via an
alternative environment subsystem.
In Ed Lazowska and Michael B. Jones, editors, Proceedings of the USENIX
Windows NT Symposium, Berkeley, CA, August 1997. Usenix
Association.
Exercises and Discussion Topics
-
If you are using Windows try installing a Unix-type operating
system on a spare (old) computer, or disk drive, or disk partition.
Linux (e.g. Ubunutu), FreeBSD, and Solaris are some possible choices
for a free Unix-like operating system.
Alternatively, install on your Windows machine the Cygwin
environment, which offers comparable functionality.
-
Learn the regular expression syntax and the related
find commands provided by the editor you are using.
-
Write regular expressions to locate integers, floating point numbers,
and a given word within a character string.
Where would you use such expressions?
-
Learn about and experiment with the tag facility of the editor you
are using.
-
Often programmers identify code parts that need further attention
using a special marker such as XXX or FIXME.
Search for, count, and examine such instances in the source code tree.
-
Propose a simple way to calculate a "measure of similarity" metric for
two different files.
Apply this metric to files in the source collection that share the same
filename and create a list of files that could benefit from a structured
approach towards code reuse.
-
Identify code-reading tasks that can benefit by a custom-built tool.
Briefly describe how each such tool could be built.
Implement one of the tools you described.
-
Provide (or, better yet, locate) code examples for some of the compiler
warnings discussed in this lecture.
Can your compiler detect them?
Are there legitimate reasons for using such code?
-
Many warnings generated by C compilers would be treated as errors in
other strongly typed languages.
Find a list of warnings that your C compiler generates, and mark those
that identify code with legitimate uses.
See how many of them are detected in other languages.
-
Run some of the winning IOCCC (http://www.ioccc.org)
entries
through the C preprocessor and try to understand how the program
works and how the original obfuscation was achieved.
-
Compile a program (with compiler optimizations disabled)
and read the generated symbolic code.
Repeat this process with all optimizations enabled.
Describe how arguments are passed between functions in each case.
-
Format one of the course's example files using a pretty-printer.
-
See if your editor supports user-defined syntax coloring.
Define syntax-coloring for a language that you use and the editor
does not support.
General Purpose Tools
The Unix Toolchest
Available under Unix, GNU/Linux, *BSD.
For Windows:
Working with Unix Tools
- Fetch or generate data
- Selection
- Processing
- Summarizing
- Program development
Data Fetching and Generation
- nm: object files
- nm, ldd: executables
- tar: archives
- ar, jar: libraries
- find: directory hierarchies
- wget: the web
- cvs or svn log or annotate: version control
- jot or dd: generate artificial data
Selection
- awk: delimited records
- cut: fixed width files
- sed: select row parts with regular expressions
- grep: select rows with regular expressions
- Combine the above in pipelines
Processing
- sort: order by multiple keys
- uniq: remove duplicates, count
- diff: compare sequential files
- comm: compare ordered lists
- tr: map between character sets
- recode: map between text representations
- tac/rev: reverse order of lines, characters
- rs: reshape arrays
- join: relational join
- Scripting languages handle more complex tasks
Summarizing
- wc: count lines, characters
- head: first elements
- tail: last elements
- fmt: format into lines
- awk with an END block
Plumbing
- Pass data through pipeline
- tee: tap output for further processing
- xargs: apply command to numerous arguments
- Pipe into a while loop for complicated processing
Example: Analyze Java Files
Examine all Java files located in the directory src, and print the ten files with the highest number of occurrences of a method call to substring.
find src -name ’*.java’ -print |
xargs fgrep -c .substring |
sort -t: -rn -k2 |
head -10
Example: Determine Commit Times
find . -type f |
grep -v CVS |
xargs cvs -d /home/ncvs log -SN 2>/dev/null |
awk '/^date/{
hour = substr($3, 0, 2)
lines[$5 " " hour] += $9
}
END {
for (i in lines) print i, lines[i]
}' |
sed 's/;//g' |
sort >dev-times-lines.dat
join dev-times-lines.dat devlong >dev-times-lines-long.dat
Example Result:Plotted Commit Times
Program Development
- Editors
- GNU Compiler Collection
- Language development tools (bison, flex)
- Tracers and debuggers
- Profilers
- Code formatting tools (cb, indent)
- Version control systems
(More on the above later.)
Outwit Tools
- winclip - Access the windows clipboard
- winreg - manipulate the windows registry
- docprop - read document properties
- odbc - select data from relational databases
- readlink - resolve shell shortcuts
- readlog - access the windows event log
Outwit Examples
Create an list of fax recipients ordered by the number of faxes they have received.
readlog Application |
awk -F: "/Outbound: Information: Fax Sent/{print $12}" |
sort |
uniq -c |
sort -rn
Extracts the email address from all records from the table users which is part of the database userDB and sends them the file message by email.
fmt message |
mail $(odbc DSN=userDB "select email from users")
An Editor Checklist
- Regular expressions
- Syntax coloring
- Folding
- Error highlighting
- Indentation
- Marking of matching delimiters
- On-line help for API elements
- Code browsing
- Multiple buffers
- Macros
- Refactoring support
Two candidates: Emacs, vim.
Further Reading
- Alfred V. Aho, Brian W.
Kernighan, and Peter J. Weinberger.
Awk—a pattern scanning and processing language.
In Unix Programmer's Manual [Unix Programmer's Manual, 1979].
Also available online http://plan9.bell-labs.com/7thEdMan/.
- S. R. Bourne.
An introduction to the UNIX shell.
In Unix Programmer's Manual [Unix Programmer's Manual, 1979].
Also available online http://plan9.bell-labs.com/7thEdMan/.
- Brian W. Kernighan
and Rob Pike.
The UNIX
Programming Environment.
Prentice Hall, Englewood Cliffs, NJ, 1984.
- Geoffrey J. Noer.
Cygwin32:
A free Win32 porting layer for UNIX applications.
In Susan Owicki and Thorsten von Eicken, editors, Proceedings of the 2nd
USENIX Windows NT Symposium, Berkeley, CA, August 1998. Usenix
Association.
- Eric Steven Raymond.
The Art of
Unix Programming.
Addison-Wesley, 2003.
- Dennis M. Ritchie
and Ken Thompson.
The UNIX time-sharing system.
Communications of the ACM, 17(7):365–375, July 1974.
- Diomidis Spinellis.
Outwit:
Unix tool-based programming meets the Windows world.
In Christopher Small, editor, USENIX 2000 Technical Conference
Proceedings, pages 149–158, Berkeley, CA, June 2000. USENIX
Association.
- Diomidis Spinellis.
Dear
editor.
IEEE Software, 22(2):14–15, March/April 2005.
(doi:10.1109/MS.2005.36 (http://dx.doi.org/10.1109/MS.2005.36))
- Diomidis Spinellis.
Software
engineering glossary, version control, part 2.
IEEE Software, 22(6):c2–c3, November/December 2005.
(doi:10.1109/MS.2005.169 (http://dx.doi.org/10.1109/MS.2005.169))
- Diomidis Spinellis.
Software
engineering glossary, version control, part I.
IEEE Software, 22(5):107, September/October 2005.
(doi:10.1109/MS.2005.141 (http://dx.doi.org/10.1109/MS.2005.141))
- Diomidis Spinellis.
Version
control systems.
IEEE Software, 22(5):108–109, September/October 2005.
(doi:10.1109/MS.2005.140 (http://dx.doi.org/10.1109/MS.2005.140))
- Diomidis Spinellis.
Working
with Unix tools.
IEEE Software, 22(6):9–11, November/December 2005.
(doi:10.1109/MS.2005.170 (http://dx.doi.org/10.1109/MS.2005.170))
- UNIX
Programmer's Manual. Volume 2—Supplementary Documents.
Bell Telephone Laboratories, Murray Hill, NJ, seventh edition, 1979.
Also available online http://plan9.bell-labs.com/7thEdMan/.
Exercises and Discussion Topics
- Acquaint yourself with the command-line of the computer you're
using. If it doesn't have ready access to the Unix tools, install
them.
- Evaluate the editor you're using, and adopt a new one if needed.
Performance Measurement and Management
Program States
- Directly executing code (user time)
- The kernel executes code on behalf of the program (system time)
- Waiting for an external operation to complete (idle time)
Also
- Real or clock time
- CPU time
Measuring Workload
- For terminating processes use the time command
$ time make
[...]
real 0m9.380s
user 0m2.580s
sys 0m6.640s
- For servers use top
22:14:03 up 5:22, 2 users, load average: 0.00, 0.01, 0.00
39 processes: 38 sleeping, 1 running, 0 zombie, 0 stopped
CPU states: 0.1% user, 0.4% system, 0.0% nice, 99.5% idle
Workload and Tools
Profile |
r >>u+s |
s >u |
u ≈ r |
Characterization |
I/O-bound |
kernel-bound |
CPU-bound |
Tools |
Disk, net, VM stats, packet dumps |
System call tracing |
Function profiling, basic block counting |
System Monitoring Tools
- I/O
- iostat (under *BSD), vmstat (Linux)
- Memory
- vmstat, free (Linux)
- Network
- netstat
Example: virtual memory statistics while executing the find command
$ vmstat 1
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
1 0 0 0 52296 3476 50092 0 0 340 0 189 190 0 6 94
0 1 0 0 51736 3588 50428 0 0 448 0 214 243 0 5 95
1 0 0 0 51072 3716 50724 0 0 424 0 210 218 0 5 95
0 1 0 0 50596 3848 50968 0 0 376 0 196 204 0 4 96
1 0 0 0 49952 3976 51304 0 0 464 0 220 247 0 6 94
0 1 0 0 49556 4068 51512 0 0 300 0 177 172 2 2 96
1 0 0 0 49024 4132 51816 0 0 368 0 196 195 0 7 93
1 0 0 0 48228 4200 52416 0 0 668 0 270 307 0 9 91
1 0 0 0 47668 4352 52712 0 0 448 0 215 227 0 5 95
1 0 0 0 47048 4412 53092 0 0 440 0 213 244 1 5 94
0 1 0 0 44912 4668 54080 0 0 432 0 211 225 0 4 96
Profiling System Calls
- The strace -c flag counts calls and reports time
- The results typically follow a Pareto distribution
Example:
$ strace -c find . -name test
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
66.06 0.367503 584 629 getdents64
26.31 0.146389 80 1826 lstat64
2.27 0.012621 40 313 fcntl64
1.83 0.010168 31 326 10 open
1.78 0.009894 16 625 chdir
1.23 0.006846 22 316 fstat64
[...]
Function Profiling
- Statistical (prof)
- Graph-based (gprof, Java)
- Pinpoint hotspots
Profiling C/C++ Code with Gprof
- Compile with gcc -pg
- Execute program; a file gmon.out will be produced
- The gprof program produces the report
A Flat Profile
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
6.35 0.33 0.33 _Unwind_SjLj_Register
5.77 0.63 0.30 10192020 0.00 0.00 __gnu_norm::__deque_buf_size(unsigned int)
1.54 1.53 0.08 10 8.00 118.33 parse_parse()
1.35 1.60 0.07 5096343 0.00 0.00 operator==(Tokid, Tokid)
A Call Graph Profile
index % time self children called name
10 Pdtoken::process_pragma() <cycle 1> [360]
[4] 22.8 0.08 1.10 10 parse_parse() <cycle 1> [4]
0.00 0.42 8348/17593 Token::unify(Token const&, Token const&) [5]
0.00 0.26 4328/4328 Type::declare() [28]
0.00 0.22 3511/3511 completed_typedef(Type) [33]
0.00 0.03 3529/3540 Block::enter() [220]
0.00 0.02 7402/27543 obj_lookup(std::string const&) [94]
0.00 0.02 812361/834736 YYSTYPE::operator=(YYSTYPE const&) [324]
0.01 0.01 85359/295237 Type::~Type() <cycle 3> [429]
0.00 0.02 1690/1690 Call::register_call(Token const&, Id const*) [331]
0.00 0.02 11362/11362 Type::set_abstract(Type) <cycle 4> [581]
Profiling Java Code with EJP
- Add the tracer library to your library path
- Run program under the Java VM with -Xruntracer
- View the results with the EJP Presenter
A Java Program Profile
Basic Block Counting in C/C++ Code
To investigate algorithms count the number of times each line is executed.
- Compile with gcc -pg -g -fprofile-arcs -ftest-coverage
- Execute program; a file name.bb will be created
- Run gcov -l name.bb
- A .gcov file for each source file will be created
Example:
main(int argc, char *argv[])
{
1 int i, j, k;
1 int a = 0;
11 for (i = 0; i < 10; i++)
110 for (j = 0; j < 10; j++)
1100 for (k = 0; k < 10; k++)
1000 a += i + j + k;
1 }
Architectural Inefficiencies
- Cache misses
- Branch mispredicts
- Front end stalls (failure to deliver enough instructions)
- Cache pollution through address aliasing (set associativity issues)
- Long latency instructions
- TLB misses
Architectural Profiling Tools
- Processor event monitoring
- Oprofile (http://oprofile.sourceforge.net/) (Linux)
- Locality optimizations
- SLO: Suggestions for Locality Optimizations (http://slo.sourceforge.net/)
Events that Oprofile can analyze on a Core 2 Intel CPU.
- Clock cycles when not halted
- number of instructions retired
- number of L2 cache requests
- L2 cache demand requests from this core
- Page table walk events
- cycles divider is busy
- L1 cacheable data read operations
- Bus cycles when data is sent on the bus
- number of instruction fetch misses
- Branch predicted taken with bubble 1
- [100 more]
Memory Performance
Memory as important as CPU cycles
- Difference in latency between memory types
- Large data sets
- Embedded applications
- Bandwidth requirements
Again, we need to profile
Memory Profiling C/C++ Code with Valgrind
- Create a debug build (-g)
- Run the program under valgrind
- Examine valgrind's output
- Can also detect memory leaks
Example: sed under valgrind
echo ':abcdefgh: : :' |
valgrind --tool=massif ./sed -f TEST/hanoi.sed
Inspection and Testing
Example: Processor Testing
Intel 8088 processor register test routine from IBM's 1981 PC BIOS.
The carry flag is used as the loop index.
Static Verification
Examples:
Characteristics:
- Detection of common errors
- Various levels of inference
- False positives and negatives
Identified Issues
- Possible bugs
- Portability problems (C/C++)
- Function return values (C/C++)
- Dead code
- Overcomplicated expressions
- Suboptimal code
- Duplicate code
Some FindBugs Issues
(FindBugs reports more than 200 issues.)
- An apparent infinite recursive loop.
- A container is added to itself.
- A volatile reference to an array doesn't treat the array elements as volatile
- Usage of GetResource may be unsafe if class is extended
- Creates an empty jar file entry
- Dubious catching of IllegalMonitorStateException
- Method performs math using floating point precision
- Class implements Cloneable but does not define or use clone method
- clone method does not call super.clone()
- Method might drop exception
- Method might ignore exception
- Method invokes dubious new String(String) constructor; just use the argument
- Method invokes dubious new String() constructor; just use ""
A FindBugs Run
GCC Best Practices
- Compile C/C++ with all warnings enabled (gcc -Wall)
- When warnings are enabled, alse enable optimization (gcc -O)
- Treat warnings as errors (gcc -Werror)
- Compile a debug version of C++ code defining -D_GLIBCXX_DEBUG -D_GLIBCXX_DEBUG_PEDANTIC
- Annotate printf-like functions. Example:
extern void argp_error (__const struct argp_state *__restrict __state,
__const char *__restrict __fmt, ...)
__attribute__ ((__format__ (__printf__, 2, 3)));
(Other formats include scanf, wprintf, wscanf).
C Compilation Example
main(char *argv[])
{
int a, b, c;
a = b;
printf("%g\n", a);
}
Plain compile:
$ gcc t.c
$
Compile with warnings:
$ gcc -O4 -Wall -Werror t.c
cc1: warnings being treated as errors
t.c:9: warning: return-type defaults to `int'
t.c:9: warning: first argument of `main' should be `int'
t.c:9: warning: `main' takes only zero or two arguments
t.c: In function `main':
t.c:13: warning: implicit declaration of function `printf'
t.c:13: warning: double format, different type arg (arg 2)
t.c:10: warning: unused variable `c'
t.c:14: warning: control reaches end of non-void function
t.c:10: warning: `b' might be used uninitialized in this function
Lint Best Practices
- Include a lint pass in your build
- Comment case statements lacking a break with FALLTHROUGH
- Comment uneachable code (exit, longmp) with NOTREACHED
- Comment functions taking format arguments with PRINTFLIKE, SCANFLIKE
- Sparingly use #ifndef lint to silence lint warnings
Tracing Tools
Interface | Tool |
Operating System | strace |
Library | ltrace |
Network | tcpdump |
Resource Snapshot | lsof |
Using Tracing Tools
- Locate bugs without using the source code
- Explain an application's behavior
- Determine undocumented features
- Discover appropriate APIs
- Locate performance bottlenecks
[sl]trace Tips and Tricks
- Trace running processes using the -p flag
- Send output to a file using the -o flag
- Read complete strings using a large argument (e.g. 1024) to -s
- Trace forked processes with the -f flag
Example: System Call Tracing
Which shared libraries is dot loading?
$ strace dot </dev/null 2>&1 | fgrep .so | fgrep -v ENOENT
open("/etc/ld.so.cache", O_RDONLY) = 3
open("/usr/lib/libfreetype.so.6", O_RDONLY) = 3
open("/usr/lib/libpng.so.2", O_RDONLY) = 3
open("/usr/lib/libjpeg.so.62", O_RDONLY) = 3
open("/usr/lib/libz.so.1", O_RDONLY) = 3
open("/lib/libm.so.6", O_RDONLY) = 3
open("/lib/libc.so.6", O_RDONLY) = 3
Example: Library Call Tracing
How does the whoami command obtain its data?
$ ltrace whoami
[...]
geteuid() = 1000
getpwuid(1000, 0x40013678, 0xbffffcac, 0x08048cff, 0x4012ee48) = 0x401310f0
puts("dds"dds
) = 4
exit(0 <unfinished ...>
Unit Testing
- JUnit (http://www.junit.org/) and its friends is the name of the game
- Based on test classes
- Test class contains method annotated
@Before
- Test methods are annotated with
@Test
- Test methods call
assertTrue
- main calls
junit.textui.TestRunner.run(TestClass.class)
JUnit Example
Test Class and Fields
public class RationalTest {
private Rational r12;
private Rational r13;
private Rational r56;
}
Initialization Code
@Before public void setUp() {
r12 = new Rational(1, 2);
r13 = new Rational(1, 3);
r56 = new Rational(5, 6);
}
A Test Method
@Test public void simpleAdd() {
Rational result = r12.add(r13);
assertTrue(result.equals(r56));
}
Run the Class's Test Methods
public static void main (String... args) {
junit.textui.TestRunner.run(RationalTest.class);
}
Textual Regression Testing
- Used for batch oriented applications
- A loop goes over the input files
- Each input file creates a new output file
- Output file compared with correct file
- A "priming" mode allows the creation of the correct files
Example: Testing the Sort Program
#!/bin/sh
FILES=`cd input; echo *`
# Prime test data
if [ "$1" = "-p" ] ; then
for i in $FILES ; do
sort input/$i >old/$i
done
fi
# Compare old with new result
for i in $FILES ; do
sort input/$i >new/$i
if diff old/$i new/$i ; then
echo "OK $i"
else
echo "FAIL $i"
exit 1
fi
done
Test Coverage in C/C++ Code
To investigate test coverage, you can use basic block profiling.
- Compile with gcc -pg -g -fprofile-arcs -ftest-coverage
- Execute program; a file name.bb will be created
- Run gcov -n name.bb
- You will get a report of how many lines were executed
- Numbers less than 100% indicate a lack of coverage
Example:
83.33% of 650 source lines executed in file hilbert.c
Example: Test coverage of Perl's source code versus the number of executed test cases
Further Reading
- Kent Beck and Erich
Gamma.
Test infected: Programmers love writing tests.
Java Report, 3(7):37–50, July 1998.
- David Hovemeyer and
William Pugh.
Finding bugs is easy.
ACM SIGPLAN Notices, 39(12):92–106, December 2004.
OOPSLA 2004 Onward! Track.
- Institute of Electrical and Electronics Engineers.
IEEE Standard for Software Unit Testing.
IEEE, New York, 1987.
IEEE Standard 1008-1987.
- Richard McDougall, Jim
Mauro, and Brendan Gregg.
Solaris
Performance and Tools: DTrace and MDB Techniques for Solaris 10 and
OpenSolaris.
Prentice Hall PTR, Upper Saddle River, 2006.
- Diomidis Spinellis.
I
spy.
IEEE Software, 24(2):16–17, March/April 2007.
(doi:10.1109/MS.2007.43 (http://dx.doi.org/10.1109/MS.2007.43))
Coding Standards and Conventions
File Names and Organization
- Naming conventions allow searching and navigation across files
Example C header files
- Style guides specify how elements are ordered within a file
Java example:
- class variables
- instance variables
- constructors
- methods
and
- Private
- Protected
- Public
Common File Names
File Name | Contents |
README | Project overview |
MANIFEST | List of project files with brief explanations |
INSTALL | Installation instructions |
Copying | Licensing information |
TODO | Wish list for future extensions |
NEWS | Documentation on user-visible changes |
Changes | Code change summary |
configure | Platform configuration script |
Makefile | Build specification |
Makefile.SH | Shell script producing the above |
config.h | Platform configuration definitions |
config_h.SH | Shell script producing the above |
patchlevel.h | Defines the project release version |
Common File Extensions
Extention | Contents |
.man | Unix manual page source |
.encoding | Text in a particular encoding (e.g. .utf8) |
.language-code | Text in a particular language (e.g. .de for German) |
.lib | Library---collection of object files |
.s | Assembly language source (Microsoft DOS / Windows, Unix) |
.psp | Web server-executed source |
.awk | Awk (language) source |
.frm | Microsoft Visual Basic source |
.com | MS-DOS/NT, OS/2 Rexx , VMS commands |
.png | Microsoft Windows, X-Windows, portable bitmap file |
.c | C source |
.cpp, .cxx | C++ source |
.cs | C# source |
.class | Java compiled file |
.sh | Unix csh, sh (Bourne) shell source |
.def | Microsoft Windows or OS/2 executable or shared library definition |
.so | Shared object library (Microsoft Windows, Unix) |
.vbp | Microsoft Developer Studio project and workspace file |
.dvi | Documentation in troff or TeX device-independent output |
.el | Emacs lisp source |
.tbl | Equations, pictures, tables to be typeset by eqn, pic, tbl |
.Z | File compressed with gzip or compress |
.h | C or C++ header file |
.hpp | C++ header file |
.icon | Microsoft Windows, X-Windows icon |
.idl | Interface definition file |
.info | Documentation generated by GNU Texinfo |
.tar | File collection in Java, Unix shell, tape archive format |
.java | Java source code |
.jcl | IBM JCL instructions |
.l | Lex (lexical analyzer generator) source |
.m4 | M4 (macro processor) code |
.mk | Makefile (also often without any extension) |
.mif | Documentation exported by FrameMaker |
.me | Documentation using troff mm, me macros |
.roff | Documentation in plain troff format |
.obj | Object file |
.ok | Local additions to the spell-check dictionary |
.pod | Perl, module source, documentation |
.ps | Postscript source, or formatted documentation |
.py | Python source |
.rb | Ruby source |
.res | Microsoft Windows resource, compiled resource script |
.sed | Sed (stream editor) source |
.test | Test file |
.tcl | Tcl/Tk source |
.texi | Documentation in TeX or LaTeX, GNU Texinfo format |
.y | Yacc (parser generator) source |
Indentation
- Will help identify blocks
- Beware of difference between tab character and spaces
- Correct settings in editor/IDE important
void linkTable(Table t) throws SQLException {
String name = t.getName();
for (int i = 0; i < tTable.size(); i++) {
Table o = (Table) tTable.elementAt(i);
if (o.getName().equals(name)) {
throw Trace.error(Trace.TABLE_ALREADY_EXISTS, name);
}
}
- Non-standard values often appear as editor commands:
/*
* Local Variables:
* tab-width: 4
* c-indent-level: 4
* c-argdecl-indent: 4
* c-continued-statement-offset: 4
* c-continued-brace-offset: -4
* c-label-offset: -4
* c-brace-offset: 0
* End:
*/
- Interpreting indentation can save you from counting brackets:
if ((u_int)fd >= fdp->fd_nfiles ||
(fp = fdp->fd_ofiles[fd]) == NULL)
return (EBADF);
Formatting
- Two main schools:
- Opening brace on separate line (GNU, Linux, Windows):
if (ia->ia_maddr != MADDRUNK)
{
sc->pPacketPagePhys = ia->ia_maddr;
}
- Opening brace on same line (BSD Java)
if ((cp = strchr(*argv, ':')) != NULL) {
*cp++ = '\0';
a_gid(cp);
}
- Both are OK
- Inconsitency is a problem
xpupil.y = Xy(w->eyes.pupil[0].x, w->eyes.pupil[0].y, &w->eyes.t);
xnewpupil.x = Xx(newpupil[0].x, newpupil[0].y, &w->eyes.t);
xnewpupil.y = Xy(newpupil[0].x, newpupil[0].y, &w->eyes.t);
if (!XPointEqual (xpupil, xnewpupil)) {
- Why? All explanations spell trouble:
- Undisciplined programmer
- Haphazard integration
- Uncoordinated team
Comments
Comments are not only read by humans:
- The /*- sequence means "Don't format"
- Java /** comments are read by javadoc
- FALTRHOUGH comments mark continued case statements:
case 10: /* yy */
lt->tm_year = ATOI2(p);
if (lt->tm_year < 69) /* hack for 2000 ;-} */
lt->tm_year += 100;
/* FALLTHROUGH */
case 8: /* mm */
lt->tm_mon = ATOI2(p);
- NOTREACHED mark non-reachable code points
size_t
strcspn(const char *s1, const char *s2)
{
register const char *p, *spanp;
register char c, sc;
/*
* Stop as soon as we find any character from s2. Note
* that there must be a NUL in s2; it suffices to stop
* when we find that, too.
*/
for (p = s1;;) {
c = *p++;
spanp = s2;
do {
if ((sc = *spanp++) == c)
return (p - 1 - s1);
} while (sc != 0);
}
/* NOTREACHED */
}
- $Identifier$ sequences are expanded by the VCS
- XXX means something is probably wrong here
- TODO marks areas of further work
- FIXME marks areas of further enhancement
Naming Convention Styles
Three naming conventions:
- Capitalization, or CamelCase. Examples:
- SetErrorMode (Win32)
- RandomAccessFile (Java)
- Underscore separation (exponent_is_negative) (GNU)
- Initials (splbio : set processor level for buffered I/O);
also removal of vowels and word endings (strcmp)
Java Naming Conventions
- Package names start with a domain identifier (org, com)
- Class and interface names start with an uppercase character
- Variable, field, and method names start with a lowercase character
We can thus differentiate between static method calls and message dispatches:
- Library.register(hAlias)
- name.equals(".c")
Hungarian Notation
- Developed by Charles Symonyi
- Prevalent in Win32 code
- Composed by the sequence
- Tag (indicates type)
- Base name (distinguishes identifiers with same type)
- Qualifier (standard differentiators)
- Often abused to encode hardware-dependent types (e.g. WORD)
Primitive Types
Construct | Meaning |
b | Flag (boolean) primitive type |
dw | Double word (32 bit wide) with arbitrary contents |
w | Word with arbitrary contents (typically unsigned) |
b | Byte with arbitrary contents |
ch | Character |
sz | Pointer to first character of a null-terminated string |
h | Handle of a heap-allocated object |
fn | Function |
i | Integer |
l | Long integer (32-bit wide) |
n | Short integer (16-bit wide) |
Type Constructions
Construct | Meaning |
pX | Pointer to X |
dX | Difference between two instances of X |
cX | Count of instances of type X |
mpX Y | Array (map) of Ys indexed by X |
rgX | Array (range) of Xs |
iX | Index to array rg X |
cbX | Size of instances of X in bytes |
cwX | Size of instances of X in words |
Name Qualifiers
Construct | Meaning |
XFirst | First element of an ordered set of type X values |
XLast | Last element of an ordered set of type X values. |
XNext | Next element of an ordered set of type X values |
XPrev | Previous element of an ordered set of type X values |
XLim | The upper limit of a range of X values. |
XMac | The current upper limit |
XNil | A special Nil value of type X |
XT | A temporary value of type X |
VB Primitive Types
Construct | Meaning |
cur | Currency |
time | Time |
date | Date |
dt | Date and time combined |
qry | Database query |
tbl | Database table |
VB Common Controls
Construct | Meaning |
frm | Form |
mnu | Form menu |
cmd | Command button |
chk | Check button |
opt | Radio button |
lbl | Text label |
txt | Text edit box |
pb | Picture box |
pic | Picture |
lst | List box |
cbo | Combo box |
tmr | Timer |
Programming Practices
Established practices help you read standard code patterns,
and identify deviations.
Examples:
- Use getopt to parse command line arguments.
Standard:
while ((ch = getopt(argc, argv, "mo:ps:tx")) != -1)
switch(ch) {
[...]
case 'x':
if (!domd5)
requiremd5("-x");
MDTestSuite();
nomd5stdin = 1;
break;
case '?':
default:
usage();
}
argc -= optind;
argv += optind;
Deviation:
while (argc > 1 && argv[1][0] == '-') {
switch(argv[1][1]) {
[...]
case 's':
sflag = 1;
break;
default:
usage();
}
argc--; argv++;
}
nfiles = argc - 1;
- Declare structures together with a typedef (Win32).
typedef struct _OVERLAPPED {
DWORD Internal;
DWORD InternalHigh;
DWORD Offset;
DWORD OffsetHigh;
HANDLE hEvent;
} OVERLAPPED, *LPOVERLAPPED;
- Assume 32-bit (nowadays 64-bit) address space and virtual memory (GNU)
- Use STL containers (modern C++ software)
Process Standards
- Standard documents (e.g. manual page, texinfo, javadoc)
- Build process (tools and macros)
- Release process
- Distribution format and files
- Installer files
- Versioning and tagging
- Installation directories
- User configuration options
- Often enforced through automation
Further Reading
- L. W. Cannon, R. A. Elliott,
L. W. Kirchhoff, J. H. Miller, J. M. Milner, R. W. Mitze, E. P. Schan, N. O.
Whittington, Henry Spencer, David Keppel, and Mark Brader.
Recommended C style
and coding standards.
Available online http://sunland.gsfc.nasa.gov/info/cstyle.html (December 2001).
Updated version of the Indian Hill C Style and Coding Standards paper.
- The FreeBSD Project.
Style—Kernel Source File Style Guide, December 1995.
FreeBSD Kernel Developer's Manual: style(9). Available online at
http://www.freebsd.org (March 2003).
- Brian W. Kernighan
and Rob Pike.
The Practice of
Programming.
Addison-Wesley, Reading, MA, 1999.
- Brian W. Kernighan
and P. J. Plauger.
The Elements of
Programming Style.
McGraw-Hill, New York, second edition, 1978.
- Charles Simonyi.
Meta-programming:
a Software Production Method.
PhD thesis, Stanford University, CA, December 1976.
Online
http://www.parc.xerox.com/publications/bw-ps-gz/csl76-7.ps.gz
(December 2001).
- Charles Simonyi.
Hungarian
notation.
Available online
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnvsgen/html/hunganotat.asp
(December 2001), November 1999.
Microsoft Developer Network Library.
- Henry Spencer and
Geoff Collyer.
#ifdef considered harmful or portability experience with C news.
In Rick Adams, editor, USENIX Conference Proceedings, pages
185–198, Berkeley, CA, Summer 1992. USENIX Association.
- Henry Spencer.
How to steal code—or—inventing the wheel only once.
In USENIX Conference Proceedings, pages 335–345, Berkeley, CA,
Winter 1988. USENIX Association.
- Henry Spencer.
The Ten Commandments for C programmers (annotated edition).
;login:, 18(2):29–31, March slash April 1993.
- Diomidis Spinellis.
Code Reading: The Open
Source Perspective, pages 225–240.
Effective Software Development Series. Addison-Wesley, Boston, MA, 2003.
- Diomidis Spinellis.
Job
security.
IEEE Software, 26(5):14–15, Sep/Oct 2009.
(doi:10.1109/MS.2009.131 (http://dx.doi.org/10.1109/MS.2009.131))
- Diomidis Spinellis.
elyts
edoc.
IEEE Software, 28(2):104–103, March/April 2011.
(doi:10.1109/MS.2011.31 (http://dx.doi.org/10.1109/MS.2011.31))
- Richard Stallman
et al.
GNU coding
standards.
Available online http://www.gnu.org/prep/standards_toc.html
(December 2001), October 2001.
- Sun
Microsystems, Inc.
Java code conventions (http://java.sun.com/docs/codeconv/).
Available online http://java.sun.com/docs/codeconv/ (December 2001),
April 1999.
Exercises and Discussion Topics
-
Identify the coding standards that apply in your organization.
In the form of a table, compare their coverage against two other
commonly used standards.
-
Discuss the advantages and disadvantages of defining and declaring
code elements in a strictly prescribed order.
How do modern integrated development environments affect your view?
-
Devise a guideline that prescribes under what circumstances
the formatting of an imported piece of code should be changed to conform
to the local coding standards.
Take into account code quality, readability, maintenance, revision control,
and future integration issues.
-
Some argue that the Hungarian naming conventions duplicate work that is performed
by a type checking compiler.
Discuss.
-
Create an itemized list of programming practices that can help code readability.
Try, where possible, to reference existing examples.
-
Identify the process standards that apply in your organization.
Explain what software elements (files, directories) have to be examined
in order to verify the conformance of a system against those standards.
Documentation
Documentation Types
- System specification
- Software requirements specification
- Design specification
- Test specification
- User documentation
- Functional description
- Installation instructions
- Introductory guide, tutorial
- Reference manual
- Administrator manual
Often the only documentation available!
Shortcut for Code Understanding
Code
line = gobble = 0;
for (prev = '\n'; (ch = getc(fp)) != EOF; prev = ch) {
if (prev == '\n') {
if (ch == '\n') {
if (sflag) {
if (!gobble && putchar(ch) == EOF)
break;
gobble = 1;
continue;
}
[...]
}
}
gobble = 0;
[...]
}
Documentation
-s Squeeze multiple adjacent empty lines, causing the output to be
single spaced.
Specifications for Code Inspection
Code
(Apache)
switch (*method) {
case 'H':
if (strcmp(method, "HEAD") == 0)
return M_GET; /* see header_only in request_rec */
break;
case 'G':
if (strcmp(method, "GET") == 0)
return M_GET;
break;
case 'P':
if (strcmp(method, "POST") == 0)
return M_POST;
if (strcmp(method, "PUT") == 0)
return M_PUT;
if (strcmp(method, "PATCH") == 0)
return M_PATCH;
Specification
(RFC-2068)
The Method token indicates the method to be performed on the
resource identified by the Request-URI. The method is
case-sensitive.
Method = "OPTIONS" ; Section 9.2
| "GET" ; Section 9.3
| "HEAD" ; Section 9.4
| "POST" ; Section 9.5
| "PUT" ; Section 9.6
| "DELETE" ; Section 9.7
| "TRACE" ; Section 9.8
| extension-method
Obtain System Structure
Sendmail Files
arpadate.c, clock.c, collect.c, conf.c, convtime.c, daemon.c, deliver.c,
domain.c, envelope.c, err.c, headers.c, macro.c, main.c, map.c, mci.c,
mime.c, parseaddr.c, queue.c, readcf.c, recipient.c, safefile.c,
savemail.c, srvrsmtp.c, stab.c, stats.c, sysexits.c, trace.c, udb.c,
usersmtp.c, util.c, version.c,
Sendmail Documentation Headings
2.5. | Configuration file | readcf.c |
3.3.1. | Aliasing | alias.c |
3.4. | Message collection | collect.c |
3.5. | Message delivery | deliver.c |
3.6. | Queued messages | queue.c |
3.7. | Configuration | conf.c |
3.7.1. | Macros | macro.c |
3.7.2. | Header declarations | headers.c, envelope.c |
3.7.4. | Address rewriting rules | parseaddr.c |
Understand complicated algorithms
Code
for (arcp = memp->parents ; arcp ; arcp = arcp->arc_parentlist) {
[...]
if ( headp -> npropcall ) {
headp -> propfraction += parentp -> propfraction
* ( ( (double) arcp -> arc_count )
/ ( (double) headp -> npropcall ) );
}
}
Documentation
Obtain the Meaning of Source Code Identifiers
Code
#define TCPS_ESTABLISHED 4 /* established */
(Notice useless comment.)
Documentation
RFC-793
ESTABLISHED - represents an open connection, data received can be delivered to the user. The normal state for the data transfer phase of the connection.
+---------+ ---------\ active OPEN
| CLOSED | \ -----------
+---------+<---------\ \ create TCB
| ^ \ \ snd SYN
passive OPEN | | CLOSE \ \
------------ | | ---------- \ \
create TCB | | delete TCB \ \
V | \ \
+---------+ CLOSE | \
| LISTEN | ---------- | |
+---------+ delete TCB | |
rcv SYN | | SEND | |
----------- | | ------- | V
+---------+ snd SYN,ACK / \ snd SYN +---------+
| |<----------------- ------------------>| |
| SYN | rcv SYN | SYN |
| RCVD |<-----------------------------------------------| SENT |
| | snd ACK | |
| |------------------ -------------------| |
+---------+ rcv ACK of SYN \ / rcv SYN,ACK +---------+
| -------------- | | -----------
| x | | snd ACK
| V V
| CLOSE +---------+
| ------- | ESTAB |
| snd FIN +---------+
| CLOSE | | rcv FIN
V ------- | | -------
+---------+ snd FIN / \ snd ACK +---------+
| FIN |<----------------- ------------------>| CLOSE |
| WAIT-1 |------------------ | WAIT |
+---------+ rcv FIN \ +---------+
| rcv ACK of FIN ------- | CLOSE |
| -------------- snd ACK | ------- |
V x V snd FIN V
+---------+ +---------+ +---------+
|FINWAIT-2| | CLOSING | | LAST-ACK|
+---------+ +---------+ +---------+
| rcv ACK of FIN | rcv ACK of FIN |
| rcv FIN -------------- | Timeout=2MSL -------------- |
| ------- x V ------------ x V
\ snd ACK +---------+delete TCB +---------+
------------------------>|TIME WAIT|------------------>| CLOSED |
+---------+ +---------+
Rationale Behind Nonfunctional Requirements
Code
if (newdp->d_cred > dp->d_cred) {
/* better credibility.
* remove the old datum.
*/
goto delete;
}
Documentation
(P. Vixie's BIND Security Paper)
5.1. Cache Tagging
BIND now maintains for each cached RR a "credibility" level showing
whether the data came from a zone, an authoritative answer, an authority
section, or additional data section. When a more credible RRset comes in,
the old one is completely wiped out. Older BINDs blindly aggregated data
from all sources, paying no attention to the maxim that some sources
are better than others.
Design Intelligence
- System requirements
- Architecture
- Implementation
- Rejected alternatives
Case
Pike and Thompson on adopting UTF over 16-bit Unicode representation
in Plan 9:
Unicode defines an adequate character set but an unreasonable
representation. The Unicode standard states that all characters are 16
bits wide and are communicated in 16-bit units.... To adopt Unicode,
we would have had to convert all text going into and out of Plan
9 between ASCII and Unicode, which cannot be done. Within a single
program, in command of all its input and output, it is possible to
define characters as 16-bit quantities; in the context of a networked
system with hundreds of applications on diverse machines by different
manufacturers, it is impossible.
[...]
The UTF encoding has several good properties. By far the most important
is that a byte in the ASCII range 0-127 represents itself in UTF. Thus
UTF is backward compatible with ASCII.
Internal Programming Interfaces
Examples:
- Hsqldb HTML code documentation
- Perlguts
- FreeBSD kernel interfaces: manual volume 9
Test Cases and Examples of Actual Use
Examples from the tcpdump documentation:
To print all ftp traffic through internet gateway snup: (note that the
expression is quoted to prevent the shell from (mis-)interpreting the
parentheses):
tcpdump 'gateway snup and (port ftp or ftp-data)'
To print the start and end packets (the SYN and FIN packets) of each
TCP conversation that involves a non-local host.
tcpdump 'tcp[13] & 3 != 0 and not src and dst net localnet'
Implementation Problems and Bugs
at: limitations
At and batch as presently implemented are not suitable when users are
competing for resources. If this is the case for your site, you might
want to consider another batch system, such as nqs.
cat: caveats
Because of the shell language mechanism used to perform output
redirection, the command
"cat file1 file2 > file1"
will cause the
original data in file1 to be destroyed! This is performed by the shell
before cat is run.
strftime: humor
There is no conversion specification for the phase of the moon.
ctags: bugs
Recognition of functions, subroutines and procedures for FORTRAN and
Pascal is done in a very simpleminded way. No attempt is made to deal
with block structure; if you have two Pascal procedures in different
blocks with the same name you lose.
Development and Execution Environment Problems
// The following function is not inline, to avoid build (template
// instantiation) problems with Sun C++ 4.2 patch 104631-07/SunOS 5.6.
(Often comments are harsher)
Trouble Spots
2001-09-17 Urban [...]
* proc.c: Go back to the interruptible sleep as reconnects
seem to handle it now.
[...]
2001-07-09 Jochen [...]
* proc.c, ioctl.c: Allow smbmount to signal failure to reconnect
with a NULL argument to SMB-IOC-NEWCONN (speeds up error
detection).
[...]
2001-04-21 Urban [...]
* dir.c, proc.c: replace tests on conn-pid with tests on state
to fix smbmount reconnect on smb_retry timeout and up the
timeout to 30s.
[...]
2000-08-14 Urban [...]
* proc.c: don't do interruptable_sleep in smb_retry to avoid
signal problem/race.
[...]
1999-11-16 Andrew [...]
* proc.c: don't sleep every time with win95 on a FINDNEXT
Undocumented Features
Why?
- Not officially supported
- Provided only as a support mechanism for suitably trained engineers
- Experimental or intended for a future release
- Used by the product's vendor to gain an advantage over the competition
- Badly implemented
- A security threat
- Intended only for a subset of the users or product versions
- A Trojan horse, time bomb, or back door
- Oversight
Additional Documentation Sources
- Comments
- Standards
- Publications
- Test cases
- Mailing lists
- Newsgroups
- Revision logs
- Issue-tracking databases
- Marketing material
- The source code
Common Open-Source Documentation Formats
- troff
- texinfo
- DocBook
- javadoc
- Doxygen (C, C++, Java into HTML LaTeX)
Important: properly typeset the documentation for printing.
Further Reading
- Jon Louis Bentley,
Donald E. Knuth, and Douglas McIlroy.
A literate program.
Communications of the ACM, 19(6):471–483, June 1986.
- Rohan T. Douglas.
Error message management.
Dr. Dobb's Journal, 15(1):48–51, January 1990.
- Narain Gehani.
Document
Formatting and Typesetting on the UNIX System.
Silicon Press, Summit, NJ, second edition, 1987.
- Eric Hamilton.
Literate programming—expanding generalized regular expressions.
Communications of the ACM, 31(12):1376–1385, December 1988.
- David R. Hanson.
Literate programming—printing common words.
Communications of the ACM, 30(7):594–599, July 1987.
- Robert A. Heinlein.
Stranger in a Strange Land.
G. P. Putnam's Sons, New York, 1961.
- Michael A. Jackson.
Literate programming—processing transactions.
Communications of the ACM, 30(12):1000–1010, December 1987.
- Brian W. Kernighan.
A typesetter-independent TROFF.
Computer Science Technical Report 97, Bell Laboratories, Murray Hill, NJ, 1982.
Available online at http://cm.bell-labs.com/cm/cs/cstr.
- Donald E. Knuth and
Silvio Levy.
The CWEB
System of Structured Documentation.
Addison-Wesley, Reading, MA, 1993.
- Donald E. Knuth.
The
TeXbook.
Addison-Wesley, Reading, MA, 1989.
- Donald E. Knuth.
Literate
Programming.
CSLI Lecture Notes Number 27. Stanford University Center for the Study of
Language and Information, Stanford, CA, 1992.
Distributed by the University of Chicago Press.
- Mark F. Komarinski,
Jorge Godoy, and David C. Merrill.
LDP author
guide.
Available online http://www.linuxdoc.org/LDP/LDP-Author-Guide.pdf (January
2002), December 2001.
- Leslie Lamport.
LATEX: A
Document Preparation System.
Addison-Wesley, Reading, MA, second edition, 1994.
- Samuel J. Leffler,
Marshall Kirk McKusick, Michael J. Karels, and John S. Quarterman.
The Design and
Implementation of the 4.3BSD Unix Operating System.
Addison-Wesley, Reading, MA, 1988.
- J. F. Ossanna.
NROFF/TROFF user's manual.
In Unix Programmer's Manual [Unix Programmer's Manual, 1979].
Also available online http://plan9.bell-labs.com/7thEdMan/.
- Rob Pike and Ken
Thompson.
Hello world.
In Dan Geer, editor, USENIX Technical Conference Proceedings,
pages 43–50, Berkeley, CA, Winter 1993. Usenix Association.
- Eric S. Raymond.
The New
Hacker's Dictionary.
MIT Press, Cambridge, third edition, 1996.
- Diomidis Spinellis.
Code Reading: The Open
Source Perspective, pages 241–266.
Effective Software Development Series. Addison-Wesley, Boston, MA, 2003.
- Diomidis Spinellis.
Code
documentation.
IEEE Software, 27(4):18–19, July/August 2010.
(doi:10.1109/MS.2010.95 (http://dx.doi.org/10.1109/MS.2010.95))
- UNIX
Programmer's Manual. Volume 2—Supplementary Documents.
Bell Telephone Laboratories, Murray Hill, NJ, seventh edition, 1979.
Also available online http://plan9.bell-labs.com/7thEdMan/.
- Norman Walsh and
Leonard Muellner, editors.
DocBook: The
Definitive Guide.
O'Reilly and Associates, Sebastopol, CA, 1999.
- Christopher J. Van Wyk
and Donald C. Lindsay.
Literate programming: A file difference program.
Communications of the ACM, 32(6):740–755, June 1989.
Exercises and Discussion Topics
-
Select three large projects from the course's reference source code
and classify
the available documentation.
-
Comment on the applicability of the documentation types
we described in open-source development efforts.
-
Present an overview of the source organization of apache Web server
by examining the provided documentation.
-
Locate one instance of a published algorithm reference
in the course's reference source code.
Map the published version of the algorithm against its implementation.
-
Categorize and tabulate the types of problems described in the
Bugs section of the Unix manual pages and sort them
according to their frequency.
Discuss the results you obtained.
-
The course's reference source code
contains over 40 references to undocumented behavior.
Locate them and discuss the most common observed cause of documentation
discrepancies.
-
Compare the documentation formats we described on usability,
readability, features provided, and amenability to automated
processing by ad-hoc tools.
-
Locate and typeset on a high quality output device
each of the different documentation formats
available in the course's reference source code
in your local environment.
Discuss the difficulties you encountered.
Maintainability
Introduction
The Maintainability Index
Parameter | Name | Measures |
aveV | Average Halstead complexity | Computational density |
aveV(g') | Average extended cyclomatic complexity | Logical complexity |
aveLOC | Average count of lines of code | Code size |
PerCM | Average percent of lines of comments | Human insight |
Maintainability index over time in the FreeBSD user-space code.
Metrics for Object-oriented Programs
- WMC: Weighted methods per class
- DIT: Depth of Inheritance Tree
- NOC: Number of Children
- CBO: Coupling between object classes
- RFC: Response for a Class
- LCOM: Lack of cohesion in methods
Dependency Metrics on Packages
- Efferent (outward) couplings measure
the number of packages the package we are examining depends on.
- Afferent (inward) couplings measure
the number of other packages that depend on the package we are examining.
-
Stable Dependencies Principle
dependencies in our packages should follow the direction of stability:
less stable packages should depend on more stable ones.
An unstable package in Tomcat
A stable package in the Eclipse distribution
Analyzability
- Consistency
- Expression Formatting
- Statement Formatting
- Naming Conventions
- Statement-level Comments
- Versioning Comments
- Visual Structure: Blocks and Indentation
- Length of Expressions, Functions, and Methods
- Control Structures
- Boolean Expressions
- Recognizability and Cohesion
- Dependencies and Coupling (later)
- Code Block Comments
- Data Declaration Comments
- Appropriate Identifier Names
- Locality of Dependencies
- Ambiguity
- Reviewability
Analyzability: Dependencies and Coupling
- Data Coupling: data passed to a routine
- Stamp Coupling: data passed to a routine, but only small part is used
- Control Coupling: data affects control decisions
- Temporal Coupling: behavior depends on the order elements are called (also lack of reentrancy)
- Common Coupling: sharing of data through global variables
- External Coupling: implicit sharing of knowledge of data formats or protocols
- Content Coupling: one module depends or modifies internal workings of another
Changeability
- Identification
- Separation
Stability
- Encapsulation and Data Hiding
- Data Abstraction
- Type Checking
- Compile-time Assertions
- Run-time Checks and Inspection-time Assertions
Testability
- Unit Testing
- Integration Testing
- System Testing
- Test Coverage Analysis
- Incidental Testing
Effects of the Development Environment
- Incremental Builds
- Tuning Build Performance
Further Reading
- Dan Ash, John Alderete,
Paul W. Oman, and Bruce Lowther.
Using software
maintainability models to track code health.
In ICSM '94: Proceedings of the International Conference on Software
Maintenance, pages 154–160, Washington, DC, September 1994. IEEE
Computer Society.
- David Astels.
Test Driven
Development: A Practical Guide.
Prentice Hall, Englewood Cliffs, NJ, 2003.
- Rajendra K. Bandi,
Vijay K. Vaishnavi, and Daniel E. Turk.
Predicting maintenance performance using object-oriented design complexity
metrics.
IEEE Transactions on Software Engineering, 29(1):77–87, 2003.
(doi:10.1109/TSE.2003.1166590 (http://dx.doi.org/10.1109/TSE.2003.1166590))
- Rajiv D. Banker,
Srikant M. Datar, Chris F. Kemerer, and Dani Zweig.
Software complexity and maintenance costs.
Communications of the ACM, 36(11):81–94, 1993.
(doi:10.1145/163359.163375 (http://dx.doi.org/10.1145/163359.163375))
- Victor R. Basili,
Lionel C. Briand, and Walcélio L. Melo.
A validation of object-oriented design metrics as quality indicators.
IEEE Transactions on Software Engineering, 22(10):751–761,
1996.
(doi:10.1109/32.544352 (http://dx.doi.org/10.1109/32.544352))
- Gerald M. Berns.
Assessing software maintainability.
Communications of the ACM, 27(1):14–23, 1984.
(doi:10.1145/69605.357965 (http://dx.doi.org/10.1145/69605.357965))
- Terry Bollinger,
Jeffrey Voas, and Maarten Boasson.
Persistent software attributes.
IEEE Software, 21(6):16–18, November/December 2004.
- L. W. Cannon, R. A. Elliott,
L. W. Kirchhoff, J. H. Miller, J. M. Milner, R. W. Mitze, E. P. Schan, N. O.
Whittington, Henry Spencer, David Keppel, and Mark Brader.
Recommended C style
and coding standards.
Available online http://sunland.gsfc.nasa.gov/info/cstyle.html (http://sunland.gsfc.nasa.gov/info/cstyle.html) (January
2006).
Updated version of the Indian Hill C Style and Coding Standards paper.
- S. N. Cant, D. R. Jeffery,
and B. L. Henderson-Sellers.
A conceptual model of cognitive complexity of elements of the programming
process.
Information and Software Technology, 37(7):351–362, June
1995.
- Shyam R. Chidamber
and Chris F. Kemerer.
A metrics suite for object oriented design.
IEEE Transactions on Software Engineering, 20(6):476–493, 1994.
(doi:10.1109/32.295895 (http://dx.doi.org/10.1109/32.295895))
- Shyam R. Chidamber,
David P. Darcy, and Chris F. Kemerer.
Managerial use of metrics for object-oriented software: An exploratory
analysis.
IEEE Transactions on Software Engineering, 24(8):629–639, 1998.
(doi:10.1109/32.707698 (http://dx.doi.org/10.1109/32.707698))
- Don Coleman, Dan Ash,
Bruce Lowther, and Paul W. Oman.
Using metrics to evaluate software system maintainability.
Computer, 27(8):44–49, 1994.
(doi:10.1109/2.303623 (http://dx.doi.org/10.1109/2.303623))
- Andrea De Lucia,
Eugenio Pompella, and Silvio Stefanucci.
Effort estimation for
corrective software maintenance.
In Proceedings of the 14th International Conference on Software
Engineering and Knowledge Engineering (SEKE '04), pages 409–416, New
York, July 2002. ACM Press.
(doi:10.1145/568760.568831 (http://dx.doi.org/10.1145/568760.568831))
- Jonathan Edwards.
Subtext: Uncovering the simplicity of programming.
In Ralph Johnson and Richard P. Gabriel, editors, OOPSLA '05: Proceedings
of the 20th Annual ACM SIGPLAN Conference on Object-Oriented Programming,
Systems, Languages, and Applications, pages 505–518, New York,
October 2005. ACM Press.
(doi:10.1145/1094851 (http://dx.doi.org/10.1145/1094851))
- Michael E. Fagan.
Design and
code inspections to reduce errors in program development.
IBM Systems Journal, 15(3):182–211, 1976.
- Rudolf Ferenc, István
Siket, and Tibor Gyimóthy.
Extracting facts from
open source software.
In ICSM '04: Proceedings of the 20th IEEE International Conference on
Software Maintenance (ICSM'04), pages 60–69, Washington, DC,
September 2004. IEEE Computer Society.
- The FreeBSD Project.
Style—Kernel Source File Style Guide, December 1995.
FreeBSD Kernel Developer's Manual: style(9). Available online
http://www.freebsd.org/docs.html (http://www.freebsd.org/docs.html) (January 2006).
- Erich Gamma, Richard
Helm, Ralph Johnson, and John Vlissides.
Design
Patterns: Elements of Reusable Object-Oriented Software.
Addison-Wesley, Reading, MA, 1995.
- Geoffrey K. Gill and
Chris F. Kemerer.
Cyclomatic complexity density and software maintenance productivity.
IEEE Transactions on Software Engineering, 17(12):1284–1288,
1991.
(doi:10.1109/32.106988 (http://dx.doi.org/10.1109/32.106988))
- Robert L. Glass.
Building
Quality Software.
Prentice Hall, Upper Saddle River, NJ, 1992.
- Ceki Gülü.
log4j: The
Complete Manual.
QOS.ch, Lausanne, Switzerland, 2003.
- Jane Huffman Hayes, Naresh
Mohamed, and Tina Hong Gao.
Observe-mine-adopt (OMA): an agile way to enhance software maintainability.
Journal of Software Maintenance, 15(5):297–323, 2003.
- Brian L.
Henderson-Sellers, Larry L. Constantine, and Ian M. Graham.
Coupling and cohesion: Towards a valid metrics suite for object-oriented
analysis and design.
Object Oriented Systems, 3(3):143–158, 1996.
- Brian L.
Henderson-Sellers.
Object-Oriented Metrics: Measures of Complexity.
Prentice Hall, Englewood Cliffs, NJ, 1996.
- Martin Hitz and
Behzad Montazeri.
Chidamber and Kemerer's metrics suite: A measurement theory perspective.
IEEE Transactions on Software Engineering, 22(4):267–271, 1996.
(doi:10.1109/32.491650 (http://dx.doi.org/10.1109/32.491650))
- Elliott Hughes.
Checking spelling in source code.
ACM SIGPLAN Notices, 39(12):32–38, December 2004.
- Andrew Hunt and David
Thomas.
The Pragmatic
Programmer: From Journeyman to Master.
Addison-Wesley, Boston, MA, 2000.
- Andy Hunt and Dave
Thomas.
OO in
one sentence: Keep it DRY, shy, and tell the other guy.
IEEE Software, 21(3):101–103, May/June 2004.
- Institute of Electrical and Electronics Engineers.
IEEE Standard for Software Unit Testing.
IEEE, New York, 1987.
IEEE Standard 1008-1987.
- Institute of Electrical and Electronics Engineers.
Glossary of Software Engineering Terminology.
IEEE, New York, 1990.
IEEE Standard 610.12-1990.
- Institute of Electrical and Electronics Engineers.
Information Technology — Software Packages — Quality Requirements and
Testing.
IEEE, New York, 1998.
IEEE Standard 1465-1998 (ISO/IEC 12119:1998).
- Institute of Electrical and Electronics Engineers.
Software Maintenance.
IEEE, New York, 1998.
IEEE Standard 1219-1998.
- Institute of Electrical and Electronics Engineers.
Software Test Documentation.
IEEE, New York, 1998.
IEEE Standard 829-1998.
- International Organization for Standardization.
Information Technology — Vocabulary — Part 14: Reliability,
Maintainability and Availability.
ISO, Geneva, Switzerland, 1997.
ISO/IEC2382-14.
- International Organization for Standardization.
Software Engineering — Product Quality — Part 1: Quality
Model.
ISO, Geneva, Switzerland, 2001.
ISO/IEC 9126-1:2001(E).
- Paul C. Jorgensen.
Software
Testing: A Craftsman's Approach.
CRC Press, Boca Raton, FL, 2002.
- Stephen H. Kan.
Metrics and
Models in Software Quality Engineering.
Addison-Wesley, Boston, MA, second edition, 2002.
- Cem Kaner, Jack Falk, and
Hung Quoc Nguyen.
Testing
Computer Software.
Wiley, New York, second edition, 1999.
- Brian W. Kernighan
and Rob Pike.
The Practice of
Programming.
Addison-Wesley, Reading, MA, 1999.
- Brian W. Kernighan
and P. J. Plauger.
The Elements of
Programming Style.
McGraw-Hill, New York, second edition, 1978.
- John Lakos.
Large-Scale
C++ Software Design.
Addison-Wesley, Reading, MA, 1996.
- Craig Larman.
Applying UML
and Patterns: An Introduction to Object-Oriented Analysis and Design and the
Unified Process.
Prentice Hall, Upper Saddle River, NJ, third edition, 2004.
- Meir M. Lehman and
Laszlo A. Belady.
An introduction to program growth dynamics.
In W. Freiberger, editor, Conference on Statistical Computer Performance
Evaluation Proceedings, pages 503–511, New York, 1972. Academic
Press.
- Karl Lieberherr
and Ian Holland.
Assuring good style for object-oriented programs (ftp://ftp.ccs.neu.edu/pub/research/demeter/documents/papers/LH89-law-of-demeter.ps).
IEEE Software, 6(5):38–48, September 1989.
- Robert C. Martin.
Agile Software
Development: Principles, Patterns, and Practices.
Prentice Hall, Upper Saddle River, NJ, 2003.
- Tobias Mayer and Tracy
Hall.
A critical analysis of current OO design metrics.
Software Quality Control, 8(2):97–110, 1999.
(doi:10.1023/A:1008900825849 (http://dx.doi.org/10.1023/A:1008900825849))
- Steve C McConnell.
Code Complete:
A Practical Handbook of Software Construction.
Microsoft Press, Redmond, WA, second edition, 2004.
- George A. Miller.
The magical number seven, plus
or minus two: Some limits on our capacity for processing information.
Psychological Review, 63:81–97, 1956.
Also available online http://psychclassics.yorku.ca/Miller/ (http://psychclassics.yorku.ca/Miller/) (December
2005).
- Trevor Misfeldt,
Gregory Bumgardner, and Andrew Gray.
The Elements of
C++ Style.
Cambridge University Press, Cambridge, 2004.
- Paul W. Oman and
Jack Hagemeister.
Construction and testing of polynomials predicting software maintainability.
J. Syst. Softw., 24(3):251–266, 1994.
(doi:10.1016/0164-1212(94)90067-1 (http://dx.doi.org/10.1016/0164-1212(94)90067-1))
- David L. Parnas, A. John
van Schouwen, and Shu Po Kwan.
Evaluation of safety-critical software.
Communications of the ACM, 33(6):636–648, June 1990.
(doi:10.1145/78973.78974 (http://dx.doi.org/10.1145/78973.78974))
- David Lorge Parnas.
On the criteria to be used for decomposing systems into modules.
Communications of the ACM, 15(12):1053–1058, December 1972.
Also in [HW01] pp. 145–155.
- David L. Parnas.
Software aging.
In 16th International Conference on Software Engineering, ICSE
'94, pages 279–287, Washington, DC, May 1994. IEEE Computer Society.
Also in cite [Chapter 29]HW01.
- Sandeep Purao and
Vijay K. Vaishnavi.
Product metrics for object-oriented systems.
ACM Computing Surveys, 35(2):191–221, 2003.
(doi:10.1145/857076.857090 (http://dx.doi.org/10.1145/857076.857090))
- Cristiane S. Ramos,
Káthia M. Oliveira, and Nicolas Anquetil.
Legacy software evaluation model for outsourced maintainer.
In Eighth Euromicro Working Conference on Software Maintenance and
Reengineering (CSMR'04), pages 48–57, Washington, DC, March 2004.
IEEE Computer Society.
- Linda H. Rosenberg,
Ruth Stapko, and Al Gallo.
Applying
object-oriented metrics.
In Sixth International Symposium on Software Metrics—Measurement for
Object-Oriented Software Projects Workshop, November 1999.
Presentation available online
http://www.software.org/metrics99/rosenberg.ppt (http://www.software.org/metrics99/rosenberg.ppt) (January 2006).
- Linda H. Rosenberg,
Ruth Stapko, and Al Gallo.
Risk-based object oriented testing (http://sel.gsfc.nasa.gov/website/sew/1999/topics/rosenberg_SEW99paper.pdf).
In Twenty-Fourth Annual Software Engineering Workshop, Greenbelt,
MD, December 1999. NASA, Software Engineering Laboratory.
Available online
http://sel.gsfc.nasa.gov/website/sew/1999/topics/rosenberg_SEW99paper.pdf (http://sel.gsfc.nasa.gov/website/sew/1999/topics/rosenberg_SEW99paper.pdf) (January 2006).
- Linda H. Rosenberg.
Applying and
interpreting object oriented metrics.
In Software Technology Conference '98, April 1998.
Available online http://www.literateprogramming.com/ooapply.pdf (http://www.literateprogramming.com/ooapply.pdf) (January
2006).
- Ioannis Samoladas,
Ioannis Stamelos, Lefteris Angelis, and Apostolos Oikonomou.
Open source software development should strive for even greater code
maintainability.
Communications of the ACM, 47(10):83–87, 2004.
(doi:10.1145/1022594.1022598 (http://dx.doi.org/10.1145/1022594.1022598))
- Jim Shore.
Fail fast.
IEEE Software, 21(5):21–25, September/October 2004.
- Diomidis Spinellis.
Code Quality: The Open Source Perspective, chapter 7.
Addison-Wesley, Boston, MA, 2006.
- Richard Stallman
et al.
GNU coding
standards.
Available online http://www.gnu.org/prep/standards/ (http://www.gnu.org/prep/standards/) (January 2006),
December 2005.
- Wayne Stevens, Glenford
Myers, and Larry L. Constantine.
Structured design.
IBM Systems Journal, 13(2):115–139, May 1974.
- Sun
Microsystems, Inc.
Java code conventions (http://java.sun.com/docs/codeconv/).
Available online http://java.sun.com/docs/codeconv/ (http://java.sun.com/docs/codeconv/) (January 2006),
September 1997.
- Allan Vermeulen,
Scott W. Ambler, Gregory Bumgardner, Eldon Metz, Trevor Misfeldt, Jim Shur,
and Patrick Thompson.
The Elements of
Java Style.
Cambridge University Press, Cambridge, 2000.
- Marek
Vokáč, Walter Tichy, Dag I. K. Sjøberg, Erik Arisholm, and Magne
Aldrin.
A controlled experiment comparing the maintainability of programs designed with
and without design patterns—a replication in a real programming
environment.
Empirical Software Engineering, 9(3):149–195, 2004.
(doi:10.1023/B:EMSE.0000027778.69251.1f (http://dx.doi.org/10.1023/B:EMSE.0000027778.69251.1f))
- Kurt D. Welker and
Paul W. Oman.
Software
maintainability metrics models in practice.
Crosstalk — The Journal of Defense Software Engineering,
8(11):19–23, November/December 1995.
- Karl E. Wiegers.
Peers Review in
Software: A Practical Guide.
Addison-Wesley, Boston, MA, 2001.
- Ping Yu, Tarja Systä, and
Hausi A. Müller.
Predicting
fault-proneness using OO metrics: An industrial case study.
In T. Gyimóthy, editor, CSMR '02: Proceedings of the 6th European
Conference on Software Maintenance and Reengineering, pages 99–107,
Washington, DC, March 2002. IEEE Computer Society.
Exercises and Discussion Topics
-
How can a metric like the maintainability index
be used profitably within the software development process,
without creating the wrong incentives for programmers?
-
Go through the project you've been working on and tighten
the access control so that all elements (classes, class members, functions)
are declared with the most conservative visibility.
-
Write a test harness to test your system's random number generator.
How ``random'' are the numbers it provides?
-
Download the
ckjm (http://www.spinellis.gr/sw/ckjm/) (Chidamber and Kemerer Java Metrics)
metrics tool,
and apply it to code in your environment.
Select some classes that would be flagged using the criteria we described
and explain whether there is indeed a problem with their design, and,
if there is, how you propose to improve it.
Course Projects
Deadlines
- March 3rd, 2013
- Team forms due
- March 16th, 2013 23:59
- Project forms due
- March 20th, 2013
- Existing project presentation (must demonstrate
ability to compile and run the project)
- April 3rd, 2013
- Design presentation
- June 5th, 2013
- Implementation presentation
Unless otherwise noted,
all deadlines refer to the time the course lecture begins on the corresponding
day (e.g. 15:00) or the end of the day (23:59) in Athens local time.
Late entries will affect the project's grade.
Team Form
Please complete the form
here (http://spreadsheets.google.com/viewform?formkey=cHNZX0p6aXNWQk92ZlZYOEo1VmltYmc6MA..).
Project Form
Please complete the form
here (http://spreadsheets.google.com/viewform?formkey=cHNZX0p6aXNWQk90WDA2ZkFoWlFQSVE6MA..).
2013 teams
A Planet of the teams' blogs can be found
here (http://sofos.dmst.aueb.gr/blog/).
- KoMatos : Konstantinos Matos
- RourouZahari : Ekaterina Zaharinova Ελένη Παπαγιαννάκου
- Open Source Fantastic Programmers ( O.S.F.P ) : ΚΩΣΤΑΣ ΜΠΟΥΛΤΑΔΑΚΗΣ ΘΕΟΔΟΣΗΣ ΑΓΓΕΛΗΣ
- Discovery : Μαζίν Χουσεϊν Δημήτριος - Θωμάς Δάμτσιας
- projectMG : Μαρία Πετρίτη Γιώργος Κόκκος
- det-end : Γεωργία Βλάχου
- Deep Coders : ΣΠΥΡΙΔΩΝ ΜΑΝΔΕΚΗΣ ΑΝΑΣΤΑΣΙΟΣ ΠΑΤΡΗΣ
- Codeplay : Ιωάννης Δούκας Μαρία Γιουκαρή
- Detarios : Σπύρος Μιχαλίτσης
- CookieDevs : Γιώργος Τσίρος Αρτέμης Τσικιρίδης
- The Javengers : Γιώργος Σκιαδόπουλος Παναγιώτης Κουτσαυτίκης
- : Φραγκούλης Δημήτρης Τόπι Σωτήρης
- Fantasy : Ιωάννης Κωτσοβίλης Μαρία Μπιλίρη
- OnePersonTeam2013 : ΜΕΛΑΝΘΩ ΧΡΙΣΤΙΝΑ ΒΑΛΣΑΜΑΚΗ
- shield : Αλέξανδρος Ρίζος
- iHaveNo-ideaBrah : Χαράλαμπος Πελιβανίδης - -
- P. Chatzina : Παναγιώτα Χατζίνα
- The Source : Κώστας Παπαστάμος
- DETvelopers : IOANNIS MAKRIDIS IASON TRIGONIS
- iΔΕΤ : Δημήτρης Μηλιώτης Βενιζέλος Σταγάκης
- Detarious : ΠΑΥΛΟΣ ΣΙΑΠΕΡΑΣ ΜΑΡΙΑΝΝΑ ΚΟΛΩΝΙΑ
- Code Dream Team : ΙΝΑ ΛΙΤΣΟ ΑΤΡΕΣΤΗΣ ΚΑΡΑΛΗΣ
- Ομάδα Μηδέν : Χριστίνα Μελά
- OSAngels : Μερόπη Αργυροπούλου Αγγελική Ρωμανού
- heeelp-team : Ιωάννα Μανουσάκη
- The Walking DET : ΓΕΩΡΓΙΟΣ ΠΑΤΣΙΑΟΥΡΑΣ
- h-pow : UNGUREANU SERGIU
- Team A : Konstantinos Kalpakloglou
- Innovation : Βαλέριος Χταζηγεωργίου
- teamsolo : ΑΝΤΜΙΡ ΝΤΕΜΙΡΑΪ
Hall of Fame
The Class of 2004
Nikos Korfiatis
Viky Skordali
Panagiotis Toumasis
The Class of 2005
Achilleas Anagnostopoulos
Alexandros Charos
Dalia Martisiute
Zbyněk Mužík
Athina Spiliopoulou
Polina Tzavala
The Class of 2006
Jon Igual Brun and Guillermo Barbero Maiz
John Georgiou and George Paraskevopoulos
Argyro Kazaki
Ioannis Sermetziadis
Giorgos Veiogloannis
Costas Alexandropoulos
(Honourable mention.)
Kelly Mantagou and Ioanna Mantzouratou
(Honourable mention.)
The Class of 2007
Panagiotis Adamopoulos and Georgia-Bilma Todri
Appendix: Basic Programming Elements
A Complete Program
/* $NetBSD: echo.c,v 1.7 1997/07/20 06:07:03 thorpej Exp $ */
/*
* Copyright (c) 1989, 1993
* The Regents of the University of California. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* 3. All advertising materials mentioning features or use of this software
* must display the following acknowledgement:
* This product includes software developed by the University of
* California, Berkeley and its contributors.
* 4. Neither the name of the University nor the names of its contributors
* may be used to endorse or promote products derived from this software
* without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*/
#include <sys/cdefs.h>
#ifndef lint
__COPYRIGHT(
"@(#) Copyright (c) 1989, 1993\n\
The Regents of the University of California. All rights reserved.\n");
#endif /* not lint */
#ifndef lint
#if 0
static char sccsid[] = "@(#)echo.c 8.1 (Berkeley) 5/31/93";
#else
__RCSID("$NetBSD: echo.c,v 1.7 1997/07/20 06:07:03 thorpej Exp $");
#endif
#endif /* not lint */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int __Pmain ((int, char *[]));
main(argc, argv)
int argc;
char *argv[];
{
int nflag;
/* This utility may NOT do getopt(3) option parsing. */
if (*++argv && !strcmp(*argv, "-n")) {
++argv;
nflag = 1;
}
else
nflag = 0;
while (*argv) {
(void)printf("%s", *argv);
if (*++argv)
putchar(' ');
}
if (!nflag)
putchar('\n');
exit(0);
}
Important issues:
- Headers
- Old-style declarations vs.
int main(int argc, char *argv[]);
- getopt
- strcmp return value. Common idiom:
#define STREQ(a, b) (*(a) == *(b) && strcmp((a), (b)) == 0)
- Java argv processing:
if (isConfig) {
configFile = args[i];
isConfig = false;
} else if (args[i].equals("-config")) {
isConfig = true;
} else if (args[i].equals("-debug")) {
debug = true;
} else if (args[i].equals("-nonaming")) {
- Variable initialization (nflag)
- printf and format specifications
- printf and putchar return value
if (ferror(stdout))
err(1, "stdout");
Expand: A Larger Example
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include <unistd.h>
/*
* expand - expand tabs to equivalent spaces
*/
int nstops;
int tabstops[100];
static void getstops __P((char *));
int main __P((int, char **));
static void usage __P((void));
int
main(argc, argv)
int argc;
char *argv[];
{
int c, column;
int n;
/* handle obsolete syntax */
while (argc > 1 && argv[1][0] && isdigit(argv[1][1])) {
getstops(&argv[1][1]);
argc--; argv++;
}
while ((c = getopt (argc, argv, "t:")) != -1) {
switch (c) {
case 't':
getstops(optarg);
break;
case '?':
default:
usage();
/* NOTREACHED */
}
}
argc -= optind;
argv += optind;
do {
if (argc > 0) {
if (freopen(argv[0], "r", stdin) == NULL) {
perror(argv[0]);
exit(1);
}
argc--, argv++;
}
column = 0;
while ((c = getchar()) != EOF) {
switch (c) {
case '\t':
if (nstops == 0) {
do {
putchar(' ');
column++;
} while (column & 07);
continue;
}
if (nstops == 1) {
do {
putchar(' ');
column++;
} while (((column - 1) % tabstops[0]) != (tabstops[0] - 1));
continue;
}
for (n = 0; n < nstops; n++)
if (tabstops[n] > column)
break;
if (n == nstops) {
putchar(' ');
column++;
continue;
}
while (column < tabstops[n]) {
putchar(' ');
column++;
}
continue;
case '\b':
if (column)
column--;
putchar('\b');
continue;
default:
putchar(c);
column++;
continue;
case '\n':
putchar(c);
column = 0;
continue;
}
}
} while (argc > 0);
exit(0);
}
static void
getstops(cp)
char *cp;
{
int i;
nstops = 0;
for (;;) {
i = 0;
while (*cp >= '0' && *cp <= '9')
i = i * 10 + *cp++ - '0';
if (i <= 0 || i > 256) {
bad:
fprintf(stderr, "Bad tab stop spec\n");
exit(1);
}
if (nstops > 0 && i <= tabstops[nstops-1])
goto bad;
tabstops[nstops++] = i;
if (*cp == 0)
break;
if (*cp != ',' && *cp != ' ')
goto bad;
cp++;
}
}
static void
usage()
{
(void)fprintf (stderr, "usage: expand [-t tablist] [file ...]\n");
exit(1);
}
Declarations and visibility
int nstops;
int tabstops[100];
static void getstops (char *);
int main (int, char **);
static void usage(void);
Issues:
- Global functions and variables
- Declare before use
- Use of the static keyword
Functions and Global Variables
Functions can help you identify a program's major parts.
To find what a function does:
Guess, based on the function name.
Read the comment at the beginning of the function.
Examine how the function is used.
Read the code in the function body.
Consult external program documentation.
while Loops, Conditions, and Blocks
while ((c = getopt (argc, argv, "t:")) != -1) {
switch (c) {
case 't':
getstops(optarg);
break;
case '?':
default:
usage();
/* NOTREACHED */
}
}
Issues:
switch Statements
- Example: expand getopt processing
- The use of fallthrough:
case 'a':
fts_options |= FTS_SEEDOT;
/* FALLTHROUGH */
case 'A':
f_listdot = 1;
break;
- Defensive handling of default statements:
switch (program) {
case ATQ:
/* [...] */
case BATCH:
writefile(time(NULL), 'b');
break;
default:
panic("Internal error");
break;
}
for Loops
- Typical case:
for (i = 0; i < len; i++) {
- Atypical cases
-
for ( i = 0; i <= extrknt; i++ )
-
for (i = 1; i < month; i++)
-
for (i = 1; i <= nargs; i++)
-
for (code = 255; code >= 0; code--) {
- Library function loops:
if ((dd = opendir(dir)) == NULL)
return (CC_ERROR);
for (dp = readdir(dd); dp != NULL; dp = readdir(dd)) {
- Use of multiple expressions:
for (cnt = 1, t = p; cnt <= cnt_orig; ++t, ++cnt) {
- Infinite loop:
for (;;) {
s = (state_t) (*s)();
quiet = 0;
}
Loop Control Statements
- break: exit loop
- continue: restart loop
- Perl equivalents: last, next
- continue as placeholder
for (; *string && isdigit(*string); string++)
continue;
- Java's labeled loops and control statements:
skip:
for ( /* [...] */ ) {
if ( ch == limit.charAt(0) ) {
for (int i = 1 ; i < limlen ; i++) {
if ( /* [...] */ )
continue skip;
}
return ret;
}
}
- Used for clarity:
comp : while(prev < length) {
/* [...] */
if (pos >= length || pos == -1) {
/* [...] */
break comp;
}
}
Character Expressions
- Range membership:
while ('0' <= *cp && *cp <= '9')
- Case conversion:
if ('a' <= *s && *s <= 'z')
*s -= ('a' - 'A');
- Both examples above illustrate non-portable code.
- A more complex example:
i = 0;
while (*cp >= '0' && *cp <= '9')
i = i * 10 + *cp++ - '0';
Boolean Expressions
goto Statements
- Can lead to spaghetti code
- Not supported in Java
- Used to exit after an error:
for (;;) {
i = 0;
while (*cp >= '0' && *cp <= '9')
i = i * 10 + *cp++ - '0';
if (i <= 0 || i > 256) {
bad:
fprintf(stderr, "Bad tab stop spec\n");
exit(1);
}
if (nstops > 0 && i <= tabstops[nstops-1])
goto bad;
tabstops[nstops++] = i;
if (*cp == 0)
break;
if (*cp != ',' && *cp != ' ')
goto bad;
cp++;
}
- Execute code again:
struct servent *
getservent()
{
char *p;
register char *cp, **q;
if (servf == NULL && (servf = fopen(_PATH_SERVICES, "r" )) == NULL)
return (NULL);
again:
if ((p = fgets(line, BUFSIZ, servf)) == NULL)
return (NULL);
if (*p == '#')
goto again;
cp = strpbrk(p, "#\n");
if (cp == NULL)
goto again;
*cp = '\0';
serv.s_name = p;
p = strpbrk(p, " \t");
if (p == NULL)
goto again;
*p++ = '\0';
while (*p == ' ' || *p == '\t')
p++;
cp = strpbrk(p, ",/");
- Exit from deep control structures (instead of break):
for (;;) {
/* [...] */
/* Gather incoming message bytes if needed. */
if ((sc->sc_state & NCR_DROP_MSGIN) == 0) {
if (n >= NCR_MAX_MSG_LEN) {
ncr_sched_msgout(sc, SEND_REJECT);
sc->sc_state |= NCR_DROP_MSGIN;
} else {
*sc->sc_imp++ = *sc->sci_data;
n++;
/*
* This testing is suboptimal, but most
* messages will be of the one byte variety, so
* it should not affect performance
* significantly.
*/
if (n == 1 && IS1BYTEMSG(sc->sc_imess[0]))
goto have_msg;
if (n == 2 && IS2BYTEMSG(sc->sc_imess[0]))
goto have_msg;
if (n >= 3 && ISEXTMSG(sc->sc_imess[0]) &&
n == sc->sc_imess[1] + 2)
goto have_msg;
}
}
/* [...] */
/* back to nextbyte */
}
have_msg:
/* We now have a complete message. Parse it. */
Refactoring in the Small
- Keep code's functional properties, while improving non-functional
properties
- Readability
- Maintainability
- Portability
- Reliability
- Can be performed in the large (architecture), or in the small (code elements)
Illustrative Example
if (*cp == 0)
break;
if (*cp != ',' && *cp != ' ')
goto bad;
cp++;
Changed into:
if (*cp == 0)
break;
else if (*cp != ',' && *cp != ' ')
goto bad;
else
cp++;
Refactoring Example
Original Code
(worms game)
op = &(!x ? (!y ? upleft : (y == bottom ? lowleft : left)) :
(x == last ? (!y ? upright : (y == bottom ? lowright : right)) :
(!y ? upper : (y == bottom ? lower : normal))))[w->orientation];
What is going one here?
More Readable Representation
(Formatted as cascading if-else)
op = &(
!x ? (
!y ?
upleft
: ( y == bottom ?
lowleft
:
left
)
) : ( x == last ? (
!y ?
upright
: ( y == bottom ?
lowright
:
right
)
) : (
!y ?
upper
: ( y == bottom ?
lower
:
normal
)
)
))[w->orientation];
Re-implementation
struct options *locations[3][3] = {
{upleft, upper, upright},
{left, normal, right},
{lowleft, lower, lowright},
};
int xlocation, ylocation;
if (x == 0)
xlocation = 0;
else if (x == last)
xlocation = 2;
else
xlocation = 1;
if (y == 0)
ylocation = 0;
else if (y == bottom)
ylocation = 2;
else
ylocation = 1;
op = &(locations[ylocation][xlocation])[w];
Other Implementation
(Suggested by Guy Steele)
op =
&( !y ? (!x ? upleft : x!=last ? upper : upright ) :
y!=bottom ? (!x ? left : x!=last ? normal : right ) :
(!x ? lowleft : x!=last ? lower : lowright )
)[w->orientation];
Other Refactoring Changes
- Add brackets in expressions
- Add missing comments
- But "don't comment bad code, rewrite it"
- Use better variable names
- Fix/improve indentation
- The indent program may be helpful (apply with care)
- Beware of forks; do it only on code you own
- Use a version control system
- Separate style commits from functional commits
Assignment Expressions
- Assignment in an if statement:
if ((p = q))
q[-1] = '\n';
- Avoided with explicit comparison:
if ((p = strchr(name, '=')) != NULL) {
p++;
- Made explicit with a non-modifiable lvalue:
if (0 == serconsole)
serconsinit = 0;
- In Java and C# the above matter only for boolean expressions
Bit Manipulation
- Example:
do {
putchar(' ');
column++;
} while (column & 07);
Rule:
a & b /* is sometimes used for */ a % (b + 1) /* b = 2**n - 1 */
- Example:
n = ((dp - cp) << 2) + 1; /* 4 times + NULL */
Rule:
a << n /* is sometimes used for */ a * k /* k = 2**n */
- Example:
bp = bp1 + ((bp2 - bp1) >> 1);
Rule:
a >> n /* is sometimes used for */ a / k /* k = 2**n */
Reading Control Structures
In structured programs:
- Treat the contents of a loop as a separate block.
Read
while (xenum.hasMoreElements()) {
/* [...] */
if (object instanceof Resource) {
/* [...] */
if (!copy(is, os))
/* [...] */
} else if (object instanceof InputStream) {
/* [...] */
if (!copy((InputStream) object, os))
/* [...] */
} else if (object instanceof DirContext) {
/* [...] */
}
}
as
while (xenum.hasMoreElements()) {
/* [...] */
}
- In a body of a structure, consider its controlling
expression as an asserion
if (!links.contains(next)) {
links.add(next);
}
- The above may not hold for sequences that include
- return
- goto
- break
- continue
- exceptions
Further Reading
- Edsger Wybe Dijkstra.
Go to statement considered harmful.
Communications of the ACM, 11(3):147–148, March 1968.
- Martin Fowler.
Refactoring: Improving the Design of Existing Code.
Addison-Wesley, Boston, MA, 2000.
With contributions by Kent Beck, John Brant, William Opdyke, and Don Roberts.
- C. A. R. Hoare.
Proof of a program: Find.
Communications of the ACM, 14(1):39–45, January 1971.
- Brian W. Kernighan
and P. J. Plauger.
The
Elements of Programming Style.
McGraw-Hill, New York, second edition, 1978.
- Brian W. Kernighan
and Dennis M. Ritchie.
The C
Programming Language.
Prentice-Hall, Englewood Cliffs, NJ, second edition, 1988.
- Donald E. Knuth.
The
Art of Computer Programming, volume 3: Sorting and Searching.
Addison-Wesley, Reading, MA, second edition, 1998.
- Peter Van Der Linden.
Expert
C Programming, pages 226–231.
Prentice-Hall, 1994.
- Richard J. Miara,
Joyce A. Musselman, Juan A. Navarro, and Ben Shneiderman.
Program indentation and comprehensibility.
Communications of the ACM, 26(11):861–867, 1983.
- Microsoft
Corporation.
Microsoft C# Language Specifications.
Microsoft Press, Redmond, WA, 2001.
- Paul W. Oman and Curtis R.
Cook.
Typographic style is more than cosmetic.
Communications of the ACM, 33(5):506–520, May 1990.
- Dennis M. Ritchie.
The C programming language—reference manual.
In Unix Programmer's Manual [Unix Programmer's Manual, 1979].
Also available online http://plan9.bell-labs.com/7thEdMan/.
- Diomidis Spinellis.
Code Reading: The Open
Source Perspective, pages 19–60.
Effective Software Development Series. Addison-Wesley, Boston, MA, 2003.
- Bjarne Stroustrup.
The
C++ Programming Language.
Addison-Wesley, Reading, MA, third edition, 1997.
- Ted Tenny.
Program readability: Procedures versus comments.
IEEE Transactions on Software Engineering, 14(9):1271–1279,
September 1988.
- Edward J. Thomas and
Paul W. Oman.
A bibliography of programming style.
ACM SIGPLAN Notices, 25(2):7–16, February 1990.
- UNIX
Programmer's Manual. Volume 2—Supplementary Documents.
Bell Telephone Laboratories, Murray Hill, NJ, seventh edition, 1979.
Also available online http://plan9.bell-labs.com/7thEdMan/.
- Larry Wall, Tom
Christiansen, Randal L. Schwartz, and Stephen Potter.
Programming Perl.
O'Reilly and Associates, Sebastopol, CA, third edition, 2000.
Exercises and Discussion Topics
-
Experiment to find-out how your C, C++, and Java compilers deal with
uninitialized variables.
Outline your results and propose an inspection procedure for
locating uninitialized variables.
-
Look in the course's source code repository for programs
that do not verify the result of library calls.
Propose practical fixes.
-
Examine the visibility of functions and variables in programs in your
environment.
Can it be improved (made more conservative)?
-
Discover how the editor you are using can identify matching braces and parentheses.
If it can not, consider to start using another editor.
-
Provide five examples from the course's source code collection
where the code structure can be improved to make it more readable.
-
You can find tens of intentionally unreadable C programs at the
International Obfuscated C Code Contest web site (http://www.ioccc.org).
Most of them use several layers of obfuscation to hide their algorithms.
See how gradual code changes can help you untangle their code.
If you are not familiar with the C preprocessor try to avoid programs
with a large number of #define lines.
-
The Perl language mandates the use of braces for all its control
structures.
Comment on how this affects the readability of Perl programs.
-
The code body of switch statements in the course's source code
collection is formatted differently from the other statements.
Express the formatting rule used, and explain its rationale.
-
The for statement in the C language family is very flexible.
Examine the source code provided to create a list of ten different uses.
-
Locate ten occurrences of break and continue in the source
code provided with the course.
For each case indicate the point where execution will transfer
after the corresponding statement is executed, and explain why
the statement is used.
Do not try to understand in full the logic of the code;
simply provide an explanation based on the statement's use pattern.
-
Find, simplify, and reason about five non-trivial boolean expressions in
the source-code base.
Do not spend time on understanding what the expression elements mean,
concentrate on the conditions that will make the expression become true
or false.
Where possible, identify and utilize the properties of short-circuit evaluation.
-
Study Section 2.11 from the book.
Provide a proof about the correctness of a sort algorithm,
found in the course's soruce repository.
Appendix: Advanced C Data Types
Pointers
Used:
- To construct linked data structures
- To reference dynamically allocated data structures
- To implement call by reference
- To access data elements and iterate through them
- When passing arrays as arguments
- For referring to functions
- As an alias for another value
- To represent character strings
- For direct access to system memory
Structures
Used:
- To group together data elements typically used as a whole
- To return multiple data elements from a function
- To construct linked data structures
- To map the organization of data on
- hardware devices
- network links
- storage media
(This use is non-portable)
- To implement abstract data types
- To program in an object-oriented fashion
Unions
Used:
- To use storage efficiently
- To implement polymorphism
- To access data using different internal representations
Dynamic Memory Allocation
- Ensure memory allocated with malloc is always freed
- Pointers do NOT have to initialized with malloced memory
- When objects are shared memory is often managed with a use count and a garbage collector
- alloca is sometimes used to allocate dynamic data on the stack (non-portable)
- Structures with a variable-sized array element are sometimes allocated in a single block
typedef Declarations
Further Reading
- Hans-Juergen Boehm.
Garbage collection in an uncooperative environment.
Software: Practice & Experience, 18(9):807–820, September
1988.
- Maurice Bruynooghe.
The memory management of Prolog implementations.
In Keith L. Clark and Sten-Åke Tärnlund, editors, Logic
Programming, pages 83–98. Academic Press, London, 1982.
- D. R. Chase, W. Wegman,
and F. K. Zadeck.
Analysis of pointers and structures.
ACM SIGPLAN Notices, 25(6):296–319, June 1990.
- Thomas W.
Christopher.
Reference count garbage collection.
Software: Practice & Experience, 14(6):503–507, June 1984.
- David Detlefs,
Al Dosser, and Benjamin Zorn.
Memory allocation costs in large C and C++ programs.
Software: Practice & Experience, 24(6):527–542, June 1994.
- Christopher W.
Fraser and David R. Hanson.
Exploiting machine-specific pointer operations in abstract machines.
Software: Practice & Experience, 12:367–373, 1982.
- C. A. R. Hoare.
Recursive data structures.
Technical Report AIM-223 STAN-CS-73-400, Stanford University, Computer Science
Department, October 1973.
- Brian W. Kernighan
and Dennis M. Ritchie.
The C
Programming Language.
Prentice-Hall, Englewood Cliffs, NJ, second edition, 1988.
- Kai Li and Paul Hudak.
A new list compaction method.
Software: Practice & Experience, 16(2):145–163, February 1986.
- Ravi Sethi and Tom
Stone.
Programming Languages: Concepts and Constructs.
Addison-Wesley, second edition, 1996.
- Diomidis Spinellis.
Code Reading: The Open
Source Perspective, pages 61–93.
Effective Software Development Series. Addison-Wesley, Boston, MA, 2003.
- Norihisa Suzuki.
Analysis of pointer ``rotation''.
Communications of the ACM, 25(5):330–335, May 1982.
Exercises and Discussion Topics
-
If you are familiar with C++ or Java explain how it is possible to minimize
(in C++) or avoid (in Java) the use of pointers.
-
How does a C pointer differ from a memory address?
How does that affect your understanding of code?
Which tools take advantage of the difference?
-
How can a program using structures to specify a memory layout ensure
that they are correctly allocated?
-
How would you translate a program using unions into Java?
Appendix: Data Structures in C
Vectors
Asymmetric Bounds
#define MAX_ADS 5
struct dr { /* accumulated advertisements */
[...]
n_long dr_pref; /* preference adjusted by metric */
} *cur_drp, drs[MAX_ADS];
[...]
struct dr *drp;
[...]
for (drp = drs; drp < &drs[MAX_ADS]; drp++) {
drp->dr_recv_pref = 0;
drp->dr_life = 0;
}
In the code above drs (the lower bound) is the first element of the array;
drs[MAX_ADS] (the upper bound) the first element outside the array.
When using asymmetric bounds:
- Number of elements is equal to the difference between upper and lower bound.
- When upper bound equals lower bound the range is empty.
- The lower bound is the first occupied element.
- The upper bound is the first free element.
Matrices
- Multi-dimensional structures of elements of the same type
- Fixed-sized implemented as multi-dimensional arrays
typedef double Transform3D[4][4];
IdentMat(Transform3D m)
{
register int i;
register int j;
for (i = 3; i >= 0; --i)
{
for (j = 3; j >= 0; --j)
m[i][j] = 0.0;
m[i][i] = 1.0;
}
}
- Variable-sized matrices are handled through pointers to elements,
but accessed using the array syntax:
double **mat = (double **)xmalloc(rows * sizeof(double *));
for (i = 0; i < rows; i++)
mat[i] = (double *)xmalloc(cols * sizeof(double));
- N dimensions can also be mapped into a flat structure using
multiplication.
Tables
- Rows of different-typed fields
- Often implemented as arrays of structures
static struct tok icmp2str[] = {
{ ICMP_ECHOREPLY, "echo reply" },
{ ICMP_SOURCEQUENCH, "source quench" },
{ ICMP_ECHO, "echo request" },
[...]
{ 0, NULL }
};
- Can be accessed using a pointer (type-safe)
static struct user {
uid_t uid;
char *name;
daddr_t space;
long count;
daddr_t spc30;
daddr_t spc60;
daddr_t spc90;
} *users;
/* [...] */
struct user *usr, *usrs;
for (usr = usrs, n = nusers; --n >= 0 && usr->count; usr++) {
printf("%5ld", (long)SIZE(usr->space));
if (count)
printf("\t%5ld", (long)usr->count);
printf("\t%-8s", usr->name);
if (unused)
printf("\t%5ld\t%5ld\t%5ld",
SIZE(usr->spc30), SIZE(usr->spc60),
SIZE(usr->spc90));
printf("\n");
}
free(usrs);
}
- Accessing with an integer index is not type safe (why?)
Stacks
- Often implemented as an ADT
#define STACKMAX 32
static int opstack[STACKMAX];
static int opsp;
PushOp(op)
int op;
{
if (opsp==STACKMAX) {strcpy(dispstr,"stack error"); entered=3;}
else opstack[opsp++]=op;
}
int PopOp()
{
if (opsp==0) {
strcpy(dispstr,"stack error");
entered=3;
return(kNOP);
} else
return(opstack[--opsp]);
}
int isopempty()
{
return( opsp ? 0 : 1 );
}
- Also implemented in-line:
de_stack[--stack_top] = '0' + c % 8;
de_stack[--stack_top] = '0' + (c / 8) % 8;
de_stack[--stack_top] = '0' + c / 64;
Problem with overflow-underflow checking.
Queues
Maps
- Trivial map, integer-indexed:
static const int mon_lengths[2][MONSPERYEAR] = {
{ 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31 },
{ 31, 29, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31 }
};
- More sophisticated: key/value pairs
- The following map is used to dispatch functions based on the
name of the command:
#define F_NEEDARG 0x01 /* needs an argument */
#define F_OFFOK 0x02 /* can turn off */
static struct key {
char *name; /* name */
void (*f) __P((struct info *)); /* function */
int flags;
} keys[] = {
{ "all", f_all, 0 },
{ "cbreak", f_cbreak, F_OFFOK },
{ "cols", f_columns, F_NEEDARG },
{ "columns", f_columns, F_NEEDARG },
{ "cooked", f_sane, 0 },
{ "dec", f_dec, 0 },
{ "everything", f_everything, 0 },
{ "extproc", f_extproc, F_OFFOK },
/* [...] */
{ "size", f_size, 0 },
{ "speed", f_speed, 0 },
{ "tty", f_tty, 0 },
};
static int c_key __P((const void *, const void *));
static int
c_key(a, b)
const void *a, *b;
{
return (strcmp(((struct key *)a)->name, ((struct key *)b)->name));
}
int
ksearch(argvp, ip)
char ***argvp;
struct info *ip;
{
char *name;
struct key *kp, tmp;
name = **argvp;
if (*name == '-') {
ip->off = 1;
++name;
} else
ip->off = 0;
tmp.name = name;
if (!(kp = (struct key *)bsearch(&tmp, keys,
sizeof(keys)/sizeof(struct key), sizeof(struct key), c_key)))
return (0);
if (!(kp->flags & F_OFFOK) && ip->off) {
warnx("illegal option -- %s", name);
usage();
}
if (kp->flags & F_NEEDARG && !(ip->arg = *++*argvp)) {
warnx("option requires an argument -- %s", name);
usage();
}
kp->f(ip);
return (1);
}
void
f_all(ip)
struct info *ip;
{
print(&ip->t, &ip->win, ip->ldisc, BSD);
}
Hash Tables
- Construct integer array index out of the (non-integer) key
static
Hash(char *string, int len)
{
int h;
h = 0;
while (len--)
h = (h << 3) ^ *string++;
if (h < 0)
return -h;
return h;
}
- Hash function, converts key into an integer
- A modulo function limits the integer into the size of a table
- A scheme for handling collisions must be provided (unless
this is a perfect hash function)
Atom
MakeAtom(string, len, makeit)
char *string;
unsigned len;
int makeit;
{
AtomListPtr a;
int hash;
int h;
int r;
hash = Hash (string, len);
if (hashTable)
{
h = hash & hashMask;
if (hashTable[h])
{
if (hashTable[h]->hash == hash && hashTable[h]->len == len &&
NameEqual (hashTable[h]->name, string, len))
{
return hashTable[h]->atom;
}
r = (hash % rehash) | 1;
for (;;)
{
h += r;
if (h >= hashSize)
h -= hashSize;
if (!hashTable[h])
break;
if (hashTable[h]->hash == hash && hashTable[h]->len == len &&
NameEqual (hashTable[h]->name, string, len))
{
return hashTable[h]->atom;
}
}
}
}
Sets
- Small sets are sometimes represented as collection of bits
- Elements are added using binary-or
pbitvec[j/BITS_PER_LONG] |= ((unsigned long)1 << (j % BITS_PER_LONG));
- Element presence is implemented using binary-and
#define FD_ISSET(n, p) \
((p)->fds_bits[(n)/NFDBITS] & (1 << ((n) % NFDBITS)))
- Element removal is implemented using binary-and of the complement
#define FD_CLR(n, p) \
((p)->fds_bits[(n)/NFDBITS] &= ~(1 << ((n) % NFDBITS)))
- Set intersection is implemented using binary and
#define XFD_ANDSET(dst,b1,b2) \
(dst)->fds_bits[0] = ((b1)->fds_bits[0] & (b2)->fds_bits[0]); \
- Set union is implemented using binary or
- Larger sets are sometimes represented as an array of
values
resourceQuarks[q >> 3] |= 1 << (q & 0x7);
Singly Linked Lists
- Properties:
- Efficient addition and removal from any position
- Linear traversal to access arbitrary elements
- Efficient dynamic expansion
- List representation:
struct host_list {
struct host_list *next;
struct in_addr addr;
} *hosts;
- Traversal:
int
search_host(addr)
struct in_addr addr;
{
struct host_list *hp;
if (!hosts)
return(0);
for (hp = hosts; hp != NULL; hp = hp->next) {
if (hp->addr.s_addr == addr.s_addr)
return(1);
}
return(0);
}
- Element addition:
void
remember_host(addr)
struct in_addr addr;
{
struct host_list *hp;
if (!(hp = (struct host_list *)malloc(sizeof(struct host_list)))) {
err(1, "malloc");
/* NOTREACHED */
}
hp->addr.s_addr = addr.s_addr;
hp->next = hosts;
hosts = hp;
}
Doubly Linked Lists
- Uses two pointers:
struct queue {
struct queue *q_next, *q_prev;
};
- Adding and removing elements can be done given only a single element pointer
- Example: add elem after head:
struct queue *next;
next = head->q_next;
elem->q_next = next;
head->q_next = elem;
elem->q_prev = head;
next->q_prev = elem;
Circular Linked Lists
- Last element points to the head
- Can not be traversed using a simple for loop
- Typically the code looks for an element to occur again
void
prlex(FILE *fp, struct wordent *sp0)
{
struct wordent *sp = sp0->next;
for (;;) {
(void) fprintf(fp, "%s", vis_str(sp->word));
sp = sp->next;
if (sp == sp0)
break;
if (sp->word[0] != '\n')
(void) fputc(' ', fp);
}
}
Trees
- Single path from each node to the trees root
- Applications
- Data organization
- Searching
- Sorting
- Language processing
- Graphics
- Compression
- Typically implemented using pointers to nodes:
typedef struct tree_s {
tree_t data;
struct tree_s *left, *right;
short bal;
}
tree;
- Processing algorithms are also recursive:
tree_t
tree_srch(tree **ppr_tree, int (*pfi_compare)(), tree_t p_user)
{
register int i_comp;
if (*ppr_tree) {
i_comp = (*pfi_compare)(p_user, (**ppr_tree).data);
if (i_comp > 0)
return (tree_srch(&(**ppr_tree).right, pfi_compare, p_user))
if (i_comp < 0)
return (tree_srch(&(**ppr_tree).left, pfi_compare, p_user))
/* not higher, not lower... this must be the one.
*/
return ((**ppr_tree).data)
}
/* grounded. NOT found.
*/
return (NULL)
}
Further Reading
- Alfred V. Aho, John E.
Hopcroft, and Jeffrey D. Ullman.
The
Design and Analysis of Computer Algorithms.
Addison-Wesley, Reading, MA, 1974.
- Arash Baratloo, Timothy
Tsai, and Navjot Singh.
Transparent run-time defense against stack smashing attacks.
In Christopher Small, editor, USENIX 2000 Technical Conference
Proceedings, Berkeley, CA, June 2000. Usenix Association.
- Mark W. Eichlin and
Jon A. Rochlis.
With microscope and
tweezers: An analysis of the internet virus of November 1988.
In IEEE Symposium on Research in Security and Privacy, pages
326–345, May 1989.
- Donald E. Knuth.
The
Art of Computer Programming, volume 1: Fundamental Algorithms.
Addison-Wesley, Reading, MA, third edition, 1997.
- Donald E. Knuth.
The
Art of Computer Programming, volume 3: Sorting and Searching.
Addison-Wesley, Reading, MA, second edition, 1998.
- Douglas C. Schmidt.
Gperf: A perfect hash function generator.
In Jim Waldo, editor, USENIX C++ Conference, pages 87–100,
Berkeley, CA, April 1990. Usenix Association.
- Robert Sedgewick.
Algorithms in C Parts 1–4: Fundamentals, Data Structures, Sorting,
Searching.
Addison-Wesley, Reading, MA, third edition, 1997.
- Robert Sedgewick.
Algorithms in C Part 5: Graph Algorithms.
Addison-Wesley, Boston, MA, third edition, 2001.
- Eugene H. Spafford.
The Internet worm: Crisis and aftermath.
Communications of the ACM, 32(6):678–687, June 1989.
- Diomidis Spinellis.
Code Reading: The Open
Source Perspective, pages 95–140.
Effective Software Development Series. Addison-Wesley, Boston, MA, 2003.
- H. E. Williams,
J. Zobel, and S. Heinz.
Self-adjusting trees in practice for large text collections.
Software: Practice & Experience, 31(10):925–940, August 2001.
Exercises and Discussion Topics
-
Select a large software system, locate where arrays are used,
and categorize their uses.
-
Fixed-length arrays notoriously impose arbitrary limits to
program elements.
Locate 20 instances of such limits in existing code and check whether
these are documented in the respective system's documentation.
-
For the RDBMS of your choice locate documentation regarding cursors.
Compare the functionality offered by cursors to the functionality offered
by pointers to structures.
-
A number of linear algebra and matrix libraries are available on the Web
in open-source form.
Download two such packages and identify how matrices are stored.
-
Examine the stack operations performed on implementations based on
explicit code.
Identify those operations that do not fit into the model we described.
Propose function names and code for implementing them.
-
Locate code where explicit control structures can be substituted
by an array lookup mechanism.
Explain how the code would be implemented.
-
Do you envisage any difficulties in implementing a hash-based map as an
abstract data type?
Suggest a way to overcome them.
-
Locate instances in the course's reference source code where sets
are being used.
Suggest other implementation strategies for each particular
problem.
Comment on whether a set data-type is used partly, because
it is easy to create an efficient implementation for it.
-
Locate five instances of singly and doubly-linked lists in the
course's reference source code and explain the function they perform.
-
Draw a binary tree containing the host names in your email
address book.
Add the names in a random order; not alphabetically.
Systematically walk through the tree to derive a sorted
listing of the host names.
Construct a new tree adding the host names in the order they
appear in the listing you constructed.
Comment on the efficiency of searching data in the two trees.
Appendix: Collaboration
Mailing Lists
VCS Notifications (CVS)
- Arrange for your VCS to notify all committers
- Committers can arrange for a special folder for these notifications
- Concept similar to ATC radio channel
Example Notification
Date: Sat, 19 Aug 2006 08:24:01 +0000 (UTC)
Subject: cvs commit: src/usr.bin/pkill Makefile
X-FreeBSD-CVS-Branch: HEAD
yar 2006-08-19 08:24:01 UTC
FreeBSD src repository
Modified files:
usr.bin/pkill Makefile
Log:
Install pkill(1), aka pgrep(1), to /bin so that rc scripts
can use this small and nifty utility. Create compatibility
symlinks from /usr/bin for the time being to avoid breaking
custom scripts relying on the hardcoded path to the utility.
Idea by: des
Discussed with: brooks (in cvs-src and cvs-all)
Revision Changes Path
1.6 +5 -0 src/usr.bin/pkill/Makefile
Setting up CVS Notifications (CVS)
Example:
#!/bin/sh
(
id -P |
awk -F: '{
print $1 " (" gensub(",.*", "", "", $8) ")\t" strftime("%F %R:%S %z (%Z)")
}
END {
print "\nRepository: '"$2"'\n"
}'
echo "Directory: $1"
egrep -v '^In directory|^Update of'
) | mail -s "[$2] cvs commit $1" $3
Setting up CVS Notifications (git)
- State project name for Git by altering first row of $GIT_DIR/description
- Inform Git about the mail configuration by appending in $GIT_DIR/config
[hooks]
mailinglist = email1@example.com email2@example.com
senderemail = myemail@example.com
- Create a notification script in $GIT_DIR/hooks/post-commit.sample
Example:
#!/bin/sh
#
# An example hook script that is called after a successful
# commit is made.
#
# To enable this hook, rename this file to "post-commit".
projectdesc=`sed -ne '1p' "$GIT_DIR/description"`
recipients=`git config hooks.mailinglist`
full_commit_log=`git log -p -1 HEAD`
commit_msg=`(git log -1 HEAD |
egrep -v '^commit|^Author|^Date')`
(
id -P |
gawk -F: '{
print $1 " (" gensub(",.*", "", "", $8) ") committed:"
}
END {
print "\nRepository: '"$2"'\n"
}'
echo "Branch of commit is: `git branch`\n"
echo "$full_commit_log"
) | mail -s "[$projectdesc] git commit '$commit_msg'" $recipients
- Rename post-commit.sample to post-commit
mv post-commit.sample post-commit
- Change permissions of post-commit to be able to execute
chmod a+x post-commit
Documentation is Ubiquitous
- System specification
- Software requirements specification
- Design specification
- Test specification
- User documentation
- Functional description
- Installation instructions
- Introductory guide, tutorial
- Reference manual
- Administrator manual
Wikis
Advantages of Wikis for storing documentation.
- Low-overhead editing
- Always current
- Revision history
- Easily accessible
Things to put on a wiki
- Project scorecards
- Personal pages (assgined projects, tasks, planned vacations)
- Time schedule
- Setup, build, release documentation
- Specifications
- Third party documentation pointers
Wiki Best Practices
Issue Tracking
- Together with VCS the two most important process-oriented tools
- Holds bugs, problems, ideas
- Each issue gets a unique id
- Should be coordinated with VCS
Issue Reports
Track bugs according to:
- area of functionality
- state
- resolution
- priority
- severity
Example: Bugzilla
Severity of a Bug
Blocker |
Blocks development or testing work (e.g. a build) |
Critical |
Crashes, loss of data, severe memory leak |
Major |
major functionality problem |
Minor |
Minor functionality problem, or easy to work around |
Trivial |
Cosmetic problem |
Enhancement |
Request for enhancement |
An Issue's Lifecycle
Resolution Options
- Fixed
- Invalid (not a bug)
- Won't fix
- Duplicate (of another bug)
- Works for me
- Moved (to another project)
Example: Releasing
Before a release.
- Triage bugs
- Fix
- Document
- Hide feature
- Next release
- Introduce a code freeze period
- Only showstoppers ...
- ... with no workaround
Bug Tracking Best Practices
- Use an existing issue tracking system
- No more than one bug per issue
- Include bug replication information
- Associate VCS commits with bug ids
- Minimize the number of bugs left open
- Use priorities to manage the developer workload
- Backup your issue database
Further Reading