Fedi, I have a #ComputerScience (maybe #Linguistics ?) Question I need your lovely guidance for
I have a design problem about grammar ambiguity ish stuff and want to find reading, resources or theory I can check out to get a good understanding of the problem space.
Particularly, I'm trying to find different techniques to use when a given word can appear in multiple parts of the syntax, in order to reduce ambiguity.
I'm being deliberately vague because I'm trying to think about generalisable heuristics here, but here's an example of the type of problems I'm thinking about. Sorry it's computery:
You have two strings (or lists of tokens) you want to combine into a single string, separated by a delimiter, such that both strings can be retrieved again. But, that delimiter can show up in either of the two strings. What are the ways you can sanitize the input strings or format the final string to clarify where one string ends and the other begins, and how do various restrictions in the input strings affect those precautions?
Possible techniques I've thought of are:
- designate an escape token and prepend all instances of the delimiter within the strings with it (eg \"
) (which is pretty universally used nowadays)
- when the delimiter appears in the string, put a repeat copy of it to distinguish it from the delimiter. e.g. "this string contains "" one quote mark"
- Another crazy option would be interlacing the two strings so all even tokens belong to string 1 and odd ones are string 2. You'd have length difference issues, but maybe there are other solutions taking a similar thought process.
So yeah I'm looking for stuff like that so I can figure out good patterns for unambiguous yet elegant grammars. For a tad more context, I'm thinking about command line argument formats, trying to think of the most user friendly ways one can handle complex data as a list of arguments.
Also please boost and let me know if there's hashtags I should include etc #CompSci #programming #askfedi #TechSupport