summaryrefslogtreecommitdiffstats
path: root/gnu/usr.bin/ptx/TODO
diff options
context:
space:
mode:
Diffstat (limited to 'gnu/usr.bin/ptx/TODO')
-rw-r--r--gnu/usr.bin/ptx/TODO94
1 files changed, 94 insertions, 0 deletions
diff --git a/gnu/usr.bin/ptx/TODO b/gnu/usr.bin/ptx/TODO
new file mode 100644
index 0000000..6714313
--- /dev/null
+++ b/gnu/usr.bin/ptx/TODO
@@ -0,0 +1,94 @@
+TODO file for GNU ptx - last revised 05 November 1993.
+Copyright (C) 1992, 1993 Free Software Foundation, Inc.
+Francois Pinard <pinard@iro.umontreal.ca>, 1992.
+
+The following are more or less in decreasing order of priority.
+
+* Use rx instead of regex.
+
+* Correct the infinite loop using -S '$' or -S '^'.
+
+* Use mmap for swallowing files (maybe wrong when memory edited).
+
+* Understand and mimic `-t' option, if I can.
+
+* Sort keywords intelligently for Latin-1 code. See how to interface
+this character set with various output formats. Also, introduce
+options to inverse-sort and possibly to reverse-sort.
+
+* Improve speed for Ignore and Only tables. Consider hashing instead
+of sorting. Consider playing with obstacks to digest them.
+
+* Provide better handling of format effectors obtained from input, and
+also attempt white space compression on output which would still
+maximize full output width usage.
+
+* See how TeX mode could be made more useful, and if a texinfo mode
+would mean something to someone.
+
+* Provide multiple language support
+
+Most of the boosting work should go along the line of fast recognition
+of multiple and complex boundaries, which define various `languages'.
+Each such language has its own rules for words, sentences, paragraphs,
+and reporting requests. This is less difficult than I first thought:
+
+ . Recognize language modifiers with each option. At least -b, -i, -o,
+-W, -S, and also new language switcher options, will have such
+modifiers. Modifiers on language switchers will allow or disallow
+language transitions.
+
+ . Complete the transformation of underlying variables into arrays in
+the code.
+
+ . Implement a heap of positions in the input file. There is one entry
+in the heap for each compiled regexp; it is initialized by a re_search
+after each regexp compile. Regexps reschedule themselves in the heap
+when their position passes while scanning input. In this way, looking
+simultaneously for a lot of regexps should not be too inefficient,
+once the scanning starts. If this works ok, maybe consider accepting
+regexps in Only and Ignore tables.
+
+ . Merge with language processing boundary processing options, really
+integrating -S processing as a special case. Maybe, implement several
+level of boundaries. See how to implement a stack of languages, for
+handling quotations. See if more sophisticated references could be
+handled as another special case of a language.
+
+* Tackle other aspects, in a more long term view
+
+ . Add options for statistics, frequency lists, referencing, and all
+other prescreening tools and subsidiary tasks of concordance
+production.
+
+ . Develop an interactive mode. Even better, construct a GNU emacs
+interface. I'm looking at Gene Myers <gene@cs.arizona.edu> suffix
+arrays as a possible implementation along those ideas.
+
+ . Implement hooks so word classification and tagging should be merged
+in. See how to effectively hook in lemmatisation or other
+morphological features. It is far from being clear by now how to
+interface this correctly, so some experimentation is mandatory.
+
+ . Profile and speed up the whole thing.
+
+ . Make it work on small address space machines. Consider three levels
+of hugeness for files, and three corresponding algorithms to make
+optimal use of memory. The first case is when all the input files and
+all the word references fit in memory: this is the case currently
+implemented. The second case is when the files cannot fit all together
+in memory, but the word references do. The third case is when even
+the word references cannot fit in memory.
+
+ . There also are subsidiary developments for in-core incremental sort
+routines as well as for external sort packages. The need for more
+flexible sort packages comes partly from the fact that linguists use
+kinds of keys which compare in unusual and more sophisticated ways.
+GNU `sort' and `ptx' could evolve together.
+
+
+Local Variables:
+mode: outline
+outline-regexp: " *[-+*.] \\| "
+eval: (hide-body)
+End:
OpenPOWER on IntegriCloud