diff options
Diffstat (limited to 'gnu/usr.bin/ptx/TODO')
-rw-r--r-- | gnu/usr.bin/ptx/TODO | 94 |
1 files changed, 94 insertions, 0 deletions
diff --git a/gnu/usr.bin/ptx/TODO b/gnu/usr.bin/ptx/TODO new file mode 100644 index 0000000..6714313 --- /dev/null +++ b/gnu/usr.bin/ptx/TODO @@ -0,0 +1,94 @@ +TODO file for GNU ptx - last revised 05 November 1993. +Copyright (C) 1992, 1993 Free Software Foundation, Inc. +Francois Pinard <pinard@iro.umontreal.ca>, 1992. + +The following are more or less in decreasing order of priority. + +* Use rx instead of regex. + +* Correct the infinite loop using -S '$' or -S '^'. + +* Use mmap for swallowing files (maybe wrong when memory edited). + +* Understand and mimic `-t' option, if I can. + +* Sort keywords intelligently for Latin-1 code. See how to interface +this character set with various output formats. Also, introduce +options to inverse-sort and possibly to reverse-sort. + +* Improve speed for Ignore and Only tables. Consider hashing instead +of sorting. Consider playing with obstacks to digest them. + +* Provide better handling of format effectors obtained from input, and +also attempt white space compression on output which would still +maximize full output width usage. + +* See how TeX mode could be made more useful, and if a texinfo mode +would mean something to someone. + +* Provide multiple language support + +Most of the boosting work should go along the line of fast recognition +of multiple and complex boundaries, which define various `languages'. +Each such language has its own rules for words, sentences, paragraphs, +and reporting requests. This is less difficult than I first thought: + + . Recognize language modifiers with each option. At least -b, -i, -o, +-W, -S, and also new language switcher options, will have such +modifiers. Modifiers on language switchers will allow or disallow +language transitions. + + . Complete the transformation of underlying variables into arrays in +the code. + + . Implement a heap of positions in the input file. There is one entry +in the heap for each compiled regexp; it is initialized by a re_search +after each regexp compile. Regexps reschedule themselves in the heap +when their position passes while scanning input. In this way, looking +simultaneously for a lot of regexps should not be too inefficient, +once the scanning starts. If this works ok, maybe consider accepting +regexps in Only and Ignore tables. + + . Merge with language processing boundary processing options, really +integrating -S processing as a special case. Maybe, implement several +level of boundaries. See how to implement a stack of languages, for +handling quotations. See if more sophisticated references could be +handled as another special case of a language. + +* Tackle other aspects, in a more long term view + + . Add options for statistics, frequency lists, referencing, and all +other prescreening tools and subsidiary tasks of concordance +production. + + . Develop an interactive mode. Even better, construct a GNU emacs +interface. I'm looking at Gene Myers <gene@cs.arizona.edu> suffix +arrays as a possible implementation along those ideas. + + . Implement hooks so word classification and tagging should be merged +in. See how to effectively hook in lemmatisation or other +morphological features. It is far from being clear by now how to +interface this correctly, so some experimentation is mandatory. + + . Profile and speed up the whole thing. + + . Make it work on small address space machines. Consider three levels +of hugeness for files, and three corresponding algorithms to make +optimal use of memory. The first case is when all the input files and +all the word references fit in memory: this is the case currently +implemented. The second case is when the files cannot fit all together +in memory, but the word references do. The third case is when even +the word references cannot fit in memory. + + . There also are subsidiary developments for in-core incremental sort +routines as well as for external sort packages. The need for more +flexible sort packages comes partly from the fact that linguists use +kinds of keys which compare in unusual and more sophisticated ways. +GNU `sort' and `ptx' could evolve together. + + +Local Variables: +mode: outline +outline-regexp: " *[-+*.] \\|" +eval: (hide-body) +End: |