Is Sanskrit The Best Language To Program Computers and AI?

coding

In the past month, I came across three seemingly unconnected experiences which set me thinking about how humans and computers interact with each other. The first was a conversation with a couple of stealth startups trying to figure out how to democratize coding, by programming computers with natural spoken language? The second were some startling announcements by Big Techs on how AI-generated Large Language Models (MLMs) could program machines and even AI models, an infinitely recursive loop if ever there was one. But it was the third, a brilliant article by Sanjana Ramachandran in Fifty Two exploring  a long-standing myth that the best language to program computers and AI is Sanskrit.

Ramachandran quotes a variety of sources – Indian government officials, a motley bunch of academicians and professors, and Indian-American author Rajiv Malhotra, who goes on to claim that Sanskrit should be credited with the last 20 years of development in Natural Language Processing (NLP), the technology behind prominent LLMs like GPT-3, DALL-E 2, etc. The claims are wide-ranging: Sanskrit is the most ‘scientific’ language, and so the ‘best to programme computers, or code AI/ML,’ it is the ‘language for future super computers,’ etc. However, one common source which everyone likes to quote, and which Ramachandran explores in detail, is ‘NASA.’ Yes, the same NASA, which sends rockets into space and is often referred to the gold standard of any claim authenticity. The reference actually has a published source, a 1985 paper ‘Knowledge Representation in Sanskrit and Artificial Intelligence’, by NASA researcher Rick Briggs. Briggs writes, “Understandably, there is a widespread belief that natural languages arc unsuitable for the transmission of many ideas that artificial languages can render with great precision and mathematical rigor.But this dichotomy, which has served as a premise underlying much work in the areas of linguistics and artificial intelligence,is a false one. There is at least one language, Sanskrit, whichfor the duration of almost one thousand years was a living spoken language with a considerable literature of its own.” He goes on to explain how it is the uniquely structured grammar and the word and sentence structuring properties that Sanskrit has, which appeals to how logic and structure- driven machines ‘think.’His paper lays out Knowledge Representation Schemes and describes how Sanskrit is best equipped to address these.

Sanskrit largely follows Panini’s grammar, his Ashtadhyayi of 500BC has 3976 rules governing spoken language. Dheepa Sundaram quoted in Fifty Two emphasizes how every classical Sanskrit word originates from about two thousand base verbal roots, or dhatus, each of which is derived from“distinct linguistic units—phonemes and morphemes—such that Ashtadhyayi functions as an algorithm.” Fifty Two quotes Stanford Professor Paul Kiparsky describing how in Sanskrit, every sentence is “seen as a little drama played out by an Agent”—the doer—“and a set of other actors which may include a Recipient, Goal, Instrument, Location and Source.” What this allows is that a sentence’s meaning “can be represented in these six basic categories, and by the relationships between them, independent of the actual words in it.” This does sound very much how an AI would ‘think,’ especially Symbolic AI or GOFAI (Good Old Fashioned AI) which reigned before Deep Learning and Neural Networks muscled in. GOFAI was a rules-based, ‘top down’ AI which required knowledge representation systems and therefore the suitability of Sanskrit. IIT professor Pawan Goyal has a different take,he believes that Sanskrit works as a bridge language. Any other natural spoken language can be mapped on to Sanskrit, which provides an ‘annotated format and exhaustive grammar’, and then this can then be used to program AI/ML,“Because Sanskrit had this formal language and formal grammar,” quotes computer science professor Deepak Kumar in Fifty Two, “whatever you say could fit into a knowledge representation system.”This bridge language concept is where Microsoft  and OpenAI are coming from when they think of GPT-3 as being one. “If you can describe what you want to do in natural language, GPT-3 will generate a list of the most relevant formulas for you to choose from,” said Microsoft CEO Satya Nadella. “The code writes itself.” Microsoft is uniquely positioned to do this: it has a billion-dollar investment in OpenAI, the GPT3 creator, and owns GitHub, the largest opensource code repository in the world. IBM is doing something similar with CodeNet, a dataset of 14mn code samples across fifty programming languages – an attempt to do for AI coding what ImageNet did for computer vision, bringing in the possibility of Natural Language Coding or NLC. As the word digitizes around us, learning how to code has become a passport to success, much like knowing English was earlier. I often talk about ‘coding being the new English;’ but can English be the new coding, or would that honor go to Sanskrit?


Subscribe To My Monthly Newsletter