Ramrod - Key Words In Concordance
Will Fleming and VJ Singh
Key Words in Concordance (KWIC) is a highly-configurable contextual and
concordance analysis program. Input can be accepted via text files or directories
of text files. Sorting can be configured on attributes such as lexicon, word
length, word frequency, etc. Output can be formatted to ASCII, HTML, Excel, Word, XML, or PDF,
and may be restricted to elements that meet certain criteria.
- Developement Status: Final design
- Intended Audience: Robert Duvall, et. al
- Programming Language: Fortran =P
Team Ramrod officially launched KWIC. The marketing site was created,
as was the Programmer's Manual. After reviewing the requirements for
the final project we have identified the required changes needed to
achieve a successful synergy between our code.
Will's code was far more succinct and so formed the
foundation for KWIC. VJ's encapsulation techniques are to be
implemented in Will's code to aide future modifications. Furthermore,
the final project will feature Intefaces, Javadoc tags and automated
JUnit test cases.
The algorithms will need to be redesigned, however, in
order to achieve stability and higher processing speeds for larger
input.
By this stage most of the code had been exported successfully into
seperate methods, allowing for future modular extension. Classes were
created for I/O, sorting, the object containers etc. Work was then
started on adding functionality to the program at this point,
especially to input and output. The main progress was made with the
Sort class, due to the rich modular functionality that empowered it.
Within the project we have created the following files:
- kwic.properties - the file that contains many of the settings used to configure this program
- Configuration.java - devoted to analyzing the kwic.properties files and informing the rest of the program of its settings
- Entry.java - contains each keyword in concordance (and supporting data)
- Input.java - controls inputting from various streams into the rest of the program
- Output.java - controls output to various streams from the rest of the program's processing
- Sort.java - the class responsible for sorting a collection of Entrys.
- Main.java - the primary class that ties together all other classes
The program design is mostly complete; issues that still need to be addressed have been identified.
This is a partial list of the issues:
-The last word of the string before the keyword is the same as the keyword.
-line numbers need fixing
-When "aligned=false" in kwic.properties, the line numbers are too close to the context words
-The max option requires implementation
-regular expressions need to be readable (include/exclude need implementation)
-printing methods need to be collapsed - recycle code
-directory input needs to be working
-frequency sorting needs work
-comments/user manual/programmer's manual/test cases need work
The program is mostly ready, but there are a few implementation issues we are trying to work around.
The directory input gave us a bit of trouble, but was taken care of; the frequency components,
though (the max option and number sort), are still not working. Test cases and a program artifact
need to be designed.