TTIOK KWIC Programmer's Manual
This version of KWIC is a
fusion of Chaz's code and Aaron's code. Originally, we attempted
to just use Aaron's code, but each person's code had enough specific
strengths (Aaron had implemented comparators and a rudimentary process
class, and Chaz had a more modular program design) to make it
worthwhile to use both.
HOLISTIC OVERVIEW
The KWIC program is designed to return
KWICs, or KeyWords In Context. The program has been
redesigned to facilitate modularity. The main 'kwic' class has
been split into 4 classes, input, process, config and output.
This serves
to facilitate a distinct separation between the main aspects of
the program. Thus, it is easier to figure out where (and how) to
implement new features.
DESIGN DECISIONS
One aspect of particular interest is
the processing of the kwic.properties file. We decided that it
would be optimal to include some sort of error checking for the file,
but it is not immediately obvious how to perform this. One
brainstorm was that we attempt to convert the argument to an integer to
check if it contains proper integral data. What we settled on was
the K_PROP structure (see below). The K_PROP has a field
containing the inputted string as well as a field containing the atoi
integer value of the string. It is left to the programmer to know
which field is correct. This was done in order to promote setting
flexibility. Instead of having to hard code each property, there
is a uniform interface to add new settings.
Another interesting design point is transferring the
settings data to the various parts of the program. We came up
with a fairly elegant solution: a function called getSetting inside the
k_config class which, when called with a string argument(example:
k_config.getSetting("order") ) returns the proper K_PROP.
A not as elegant design decision was the
implementation of comparator selection as a 6-branch if-tree.
While not elegant, this move was primarily time motivated, as it was
very quick to implement using cut/paste.
Another design decision was the creation of the
k_config class. Though this was not in our original plan, it soon
became clear that this was the most modular and efficient way to handle
the kwic.properties file.
The WORD structure was modified from Aaron's
code. The WORD struct allows the code to be sorted and
modified (for example by excluding certain keywords) rather
easily. The usefulness of the WORD struct in streamlining the
program is clear when you consider that Chaz's individual
implementation had intertwined the process and output functionalities.
PROGRAM STRUCTURE
The program includes the following
files:
kwic.cpp
-- contains main
k_input.h/cpp
-- contains
input functions for the program. The input subsystem reads in the
command line arguments, and, from them, reads the code from specified
files and directories. The input subsystem's final product which
it passes to other classes is a vector of KEYS (see below).
k_process.h/cpp
-- contains
sorting and other miscellaneous functions. The process class
takes the vector of KEYS and converts it to a vector of WORDs. It
then sorts the WORDs and processes any exclude data.
k_output.h/cpp
-- contains
output functions, such as text formatting. The output subsystem
was not successfully debugged by the end of the project. This is
solely due to running out of time.
k_config.h/cpp
-- contains
functions pertaining to the kwic.properties file. This class was
separated from the other three due to the fact that it is able to be
implemented in a manner which is not interwoven with the other
classes. This added modularity made the program easier to code,
read and modify.
*comparator.h
-- contains
a comparator for use with sort. Multiple comparators are be
implemented.
kwic.properties
-- ascii
file which contains properties to impact the functioning of the KWIC
program.
globals.h -- contains
structs which are universally used in the 4 classes.
GLOBAL.H STRUCTS
K_PROP -- contains a string
field and an integer field, as well as a field for the setting's name.
KEYS -- contains a string
field for the name and filename and an int for the line number
WORD -- a more complex
struct. Contains strings for the keyword and filename, vectors of
strings for the before and after fields, and ints for line number and
number of occurrences in file. WORDs are sorted and then
displayed as final output.
INCLUDED COMPARATORS
word -- alphabetical
sort
reverseword -- reverse alphabetical sort
length -- sort by word
length (shortest to longest)
reverselength -- sort by word length (longest to
shortest)
number --
sorts by number of occurrences (most to least)
reversenumber -- sorts by number of occurrences
(least to most)