- Chomski
Infobox programming language
name = chomski
paradigm =scripting language
year = 2007
designer = mj bishop
typing = none; all data is treated as a string
implementations = [http://bumble.sourceforge.net/machine/c/ chomski]
influenced_by =Sed ,Awk
operating_system =Cross-platform
website = [http://bumble.sourceforge.net/machine/ bumble.sourceforge.net/machine/]chomski (named after the noted linguist
Noam Chomsky ) refers to acommand line utility and languagewhich is can be used to parse and transform text patterns. Chomski (a) parses text files and (b) implements aprogramming language which can apply textual transformations to such files. It reads input files character by character (sequentially), applying the operation which has been specified via thecommand line (or a "chomski script"), and then outputs the line. It was developed from 2006 as aUnix and Windows utility, and is available today for Windows and Linux systems. Chomski has derived a number of ideas and syntax elements fromSed a command line text stream editor.Usage
The following example shows a typical use of chomski, where the "-s" option indicates that the chomski expression follows: cat inputFileName | chomski -s '/(/ { until ")"; print; } clear;' > outputFileNameIn the above script, only text within brackets would be saved in the output file.
Under Unix (and Windows), chomski can be used as a filter in a pipeline:le generate_data | chomski -s '/x/{clear;add "y";}print;clear;'That is, generate the data, and then make the small change of replacing "x" with "y".
Several commands can be put together in a file called, for example, "substitute.chom"and then be applied using the "-f" option to read the commands from the file: cat inputFileName | chomski -f substitute.chom > outputFileName
Besides substitution, other forms of simple processing are possible. For example, the following uses the "plus" and "count" commands to count the number of lines in a file: cat inputFileName | chomski -s ' [-n] {plus;} <>{count;print;}'
This example used some of the following
metacharacter s and language features:
* The squareBracket s ([]
) indicate the matching of a character class.
* The-n
' string matches a newline character.
* The<>
string matches the end of the input stream (text file).
* The curly braces ({}
) follow tests and group multiple statements.
* The semi-colon (;
) terminates all statements,Complex chomski constructs are possible, allowing it to serve as a simple, but highly specialised,
programming language . Chomski has only one flow control statement(apart from the test structures<>
,[]
,//
etc), namelythe "check" command, which jumps back to the @@ label (no other labels are permitted).History
The idea for chomski arose from the limitations of regular expression engines which use a "line by line" paradigm, and the limitations on parsing nested text patterns with regular expressions.chomski evolved as a natural progression from the
grep andsed command. Developmentbegan approximately in 2006 and continued sporadically.Developers (m.j.bishop) personal recollection]Samples
In the following example, chomski adds 3 whitespace characters at the beginning of each line of input
[-n] { add ' '; } print; clear;
* ( [-n] ) match a newline character
* ({}) only execute brace blocks if the match returned true
* ( [http://bumble.sourceforge.net/machine/doc/print.txt.html print] ) print the current contents of the workspace to standard output.
* (clear) delete the contents of the workspaceThe following script places brackets around urls in text [-a] { while !' ';
{ put; clear; add'('; get; add')'; } } print; clear; The test structures
Conditioned commands are possible using the "test structures":Test if the workspace matches exactly the string. (This is not a
regular expression match) /somestring/ {commandlist;}Test if the workspace begins with the string{commandlist;}Test if the workspace ends with the string (somestring) {commandlist;}Test if the workspace matches the character class [characterclass] {commandlist;}Test if the workspace matches any of the lines in the file =filename= {commandlist;}Test if the workspace matches the current element of the tape structure = {commandlist;}Test if the end of the input stream has been reached. (This is equivalent to the Awk END label) <> {commandlist;}These tests may be negated by added the
!
operator before (not after as in sed) the test. For example !(.) {print;}prints the workspace if it does "not" end with a period/ full-stop.Limitations
Chomski is not a general purpose programming language. Like sed it is designed for a limited typeof usage. chomski currently does not support
unicode strings since the current implementationuses standard C character arrays. Chomski does not currently have a debugger for debugging complex scripts.ee also
*
Sed
*Awk References
External links
* [http://bumble.sourceforge.net/machine Source code and executables for chomski]
* [http://bumble.sourceforge.net/machine/doc/ Documentation for the chomski language]
* [http://sed.sourceforge.net Major sources for sed scripts, files, usage]
Wikimedia Foundation. 2010.