How should I construct a standard text to test or benchmark a system for automated proofreading?
If I would start out to compare performance of solutions like:
- Microsoft word
Where do I start beyond thinking up own test texts?
To start with, I have found for example this one where authors claim that Microsoft Word hasn't seen any of 19 errors, but the text is really quite small.
Maybe you know about something more elaborated I can't find.
You need to find a way to present the same bit of text to each of the engines , trigger the analysis and then extract the result of the analysis.
This is probably just going to be the textual output unless you want to get really involved in scripting to get some stats from the engines themselves.
You could then use this to compare lots of different bits of text that might be known to cause specific issues.
Depending on how many text samples you want to test and how often you want to test it you might want this automated or if this is one off it might not be worth the effort.
A simple workflow could be to use the clipboard , you could probably do more with powershell scripting or the scripting languages built into word and libreoffice.