scan GRETIL

One way you can help improve the world's stock of digital Sanskrit data is to use skrutable's whole-file meter identification to detect certain kinds of textual problems and to contribute toward an open meter dataset for reference and benchmarking purposes.

See the cumulative results so far here.

how-to

Your results are going to be only as good as your input data, and in order to be able to do anything with your results, you're obviously going to need to already know at least a little bit about Sanskrit language and meter (aka prosody). However, assuming those two things, then it's really only a matter of having a good tool, interest, and some patience. Here's one way to go about it.

Step 1: Identify a text you'd like to work on. Go find it online. GRETIL texts include some of the highest-quality ones around and are relatively easy to work with, so they're a good starting point. For example: the Bhagavadgītā (BhG) raw GRETIL file

Step 2: Clean up the file (using search-and-replace, etc.) so that it is a text file with exactly one (potential) whole verse per line. Half an anuṣṭubh śloka is ok, too. For example: BhG input cleaned

Step 3: Submit the cleaned file to skrutable's meter identifier using the "whole file" button. skrutable is really fast, so the 700 verses of the Gītā won't take long at all: BhG raw output and BhG output cleaned. There's also the option to see summaries in spreadsheet form:

Step 4: Now read through the results, looking for irregularities (you can use regexes to search more effectively). Where there's a metrical "problem", there just might be also a textual "problem" in need of fixing or other attention.

As you might expect, the BhG text is in pretty good shape, having received a lot of attention over the years. Those non-standard upajātis sure are interesting though...

Meanwhile, there's plenty of lower-hanging fruit elsewhere, such as in the huge text of the Rāmāyaṇa: (raw in — Dec 2018 version with semicolons), (clean in), (raw out), (clean out), (tallies of meter types), (breakdown and problems).

There one definitely finds some clear textual problems: hyper- or hypometrical lines, typos that result in different syllable weights, and so on. (In development: skrutable itself can help automatically categorize and highlight problematic cases with its TSV summary output...) You can manually look for the following: invalid ("asamīcīna") anuṣṭubhs; samavṛttas with less than four valid pādas ("yuktāḥ pādāḥ"); verses recognized as having the shape of samavṛttas or ardhasamavṛttas but not yet known by name ("ajñātasamavṛtta", "ajñātārdhasamavṛtta"); upajātis that are not the standard indravajrā-upendravajrā type (e.g. "upajāti paṅkti") or with pādas of valid lengths but not yet known by name ("ajñāta"); verses that qualify equally to be one thing or another ("atha vā"); or verses not successully categorized ("na kiṃcid adhyavasitam").

Step 5: Now, if you can, try to come up with solutions for the textual problems, or in other words, explanations for the interesting stuff that you find in your text.

Then, if you want, share your results with me or others. I can add it to the cumulative results here. This work takes time, patience, and expert knowledge — preparing the data, scrutinizing each problem verse, and so on — so it definitely needs to be a team effort!

And if you were able to make improvements to the text, you can send it on back to GRETIL or wherever else. I'm happy to help you do that, if need be.

Questions? Write me, we'll talk.