I wish to challenge the widespread belief among programmars that natural languages (such as English) are too ambiguous to be of any use in programming.
Below I sketch a proof based on the definitions and concepts contained in Kenneth H. Rosen's "Discrete Mathematics and Its Applications", 4th edition, Chapter 10: Modeling Computation (Chapter and Section numbers may differ in other editions).
Type 0 grammars have no restrictions on production rules. Natural languages seem to be type 0 grammars (although the book does not say so explicitly).
Type 1 grammars are context-sensitive. {0n1n2n | n = 0,1,2,3...} is a context-sensitive language.
Type 2 grammars are context-free. Programming languages are type 2 grammars (although there may be some context-sensitivity: overloaded operators? polymorphism?).
Type 3 grammars are regular. Regex is a regular grammar (although there may be features of regexps as implemented in modern programming languages that give it some kind of context awareness).
From Section 10.1:
From these definitions we see that every type 3 grammar is a type 2 grammar, every type 2 grammar is a type 1 grammar, and every type 1 grammar is a type 0 grammar.The above means that any production rules required for a programming language can be implemented in a natural language. If the natural language doesn't already have the production rules, they can be added, since natural language does not have a fixed set of production rules. If the new production rules conflict with existing ones (causing ambiguity), you can simply define at the outset which ruleset you are using.
{0n1n2n | n = 0,1,2,3...}This formal language representation can also be expressed without ambiguity in English:
A number of 0s followed by the same number of 1s, followed by the same number of 2s.The book describes a phrase-structure grammar that generates this language:
One grammar that generates the set {0n1n2n | n = 0,1,2,3...} is G = (V,T,S,P) with V = {0,1,2,A,B}, T = {0,1,2}, starting state S, and productions S -> 0SAB, S -> lambda, BA -> AB, 0A -> 01, 1A -> 11, 1B -> 12, 2B -> 22.Here is a ruby program that also recognizes this set:
def recognize? (input)
if input =~ /^(0*)(1*)(2*)$/
n0 = $1.size; n1 = $2.size; n2 = $3.size
if n0 == n1 and n1 == n2 then return "true" end
end
return "false"
end
print("\n> "); $stdout.flush
while (line = gets) !~ /^$/
puts recognize?(line)
print("\n> "); $stdout.flush
end
C:\>tm.rb
> 012
true
> 0012
false
> 000111222
true
>
The production rules, near as I can figure, are abstracted out by ruby's regex implementation, the if-then control structure, and the operators.
It can be shown that a set can be recognized by a Turing machine if and only if it can be generated by a type 0 grammar, or in other words, if the set is generated by a phrase-structure grammar.Since English is a phrase-structure grammar, we can use a Turing machine to recognize it. One problem is, as noted in Section 10.1:
The syntax of a natural language, that is, a spoken language, such as English, French, German, or Spanish, is extremely complicated. In fact, it does not seem possible to specify all the rules of syntax for a natural language.One way to use a Turing machine to recognize English is to have an open-ended set of production rules, which can be modified by a running program. My ai programs for example can add new regexes and handlers at runtime. When the program is able to modify itself without explicit human instruction, we will be approaching true AI...