Sunday, May 11, 2008

How to get leo.org dictionary finally on windows commandline

Living in Southtirol we often need to write German as well as Italian. Given that we read tecnical stuff in English, we sometimes suffer from not knowing any word in any language any more. Well, here the beautifully leo dictionary comes into the game. Linux has this command that works from commandline, which makes things really fast without having to use a browser.

This command (apt-get install leo) is one of the many things I miss on Macosx and windows. So, since I wanted to get used to groovy, I tried to solve that. This first version is a bit rough, but does everything it is expected to:


#! /usr/bin/env groovy

if(!args || (args[0] != "ende" && args[0] != "itde" && args[0] != "frde" && args[0] != "esde")){
println "USAGE:"
println "groovy leo.groovy ende english or german words"
println "or"
println "groovy leo.groovy itde italian or german words"
println "or"
println "groovy leo.groovy frde french or german words"
println "or"
println "groovy leo.groovy esde spanic or german words"
System.exit(0);
}

def langToken = args[0]
args = args as List
args.remove(0)

def searchString = args.join(" ");
searchString = URLEncoder.encode(searchString);

def queryUrl = "http://dict.leo.org/${langToken}?lp=${langToken}"+
"〈=de&searchLoc=0&cmpType=relaxed&sectHdr=on&"+
"spellToler=on&search=${searchString}&relink=on"

def parser = new org.cyberneko.html.parsers.SAXParser()
def page = new XmlParser(parser).parse(queryUrl)

def depth = page.BODY[0].TABLE

try{
depth.each{ body ->
if(body?.attribute("id").equals("body")){
def tr = body.TR
tr.each{ bodyRow ->
if(bodyRow?.attribute("id").equals("main")){
def td = bodyRow.TD
td.each{ bodyCol ->
if(bodyCol?.attribute("id").equals("contentholder")){
bodyCol.each{ content ->
if(content?.attribute("name").equals("WORDS")){
def form = content
form.each{ formContent ->
if(formContent?.attribute("id")?.equals("results")){
def rows = formContent.TR
rows.each{ row ->
def totLine = row.value()
if(totLine.size() == 5){
// then it is a translation line
def first = totLine.get(1);
def second = totLine.get(3);

def firstKids = first?.children()
firstKids.each{ kid ->
if(kid instanceof String){
print kid + " "
}else if(kid instanceof Node){
def k = kid?.children();
k.each{
if(it instanceof String){
print it + " "
}else{
print it.text() + " "
}
}
}
}
print "\t\t\t\t"

def secondKids = second?.children()
secondKids.each{ kid ->
if(kid instanceof String){
print kid + " "
}else if(kid instanceof Node){
def k = kid?.children();
k.each{
if(it instanceof String){
print it + " "
}else{
print it.text() + " "
}
}
}
}
println ""
}
}
}
}
}
}
}
}
}
}
}
}
}catch(Exception e){
}


For this to work, you will need two additional librarires in your groovy libs folder in the groovy home:
  • the nekohtml.jar (which is not standard, you can find inside the zip downloaded here)
  • the xerces jar for xml parsing (do a locate *xerces*.jar and I would bet you have something there :))

Alright so let me try to see what the German word for groovy is:
>>> leo.groovy ende groovy
ENGLISCH DEUTSCH
groovy Ê adj. fetzig
groovy Ê adj. handwerksmŠ§ig
groovy Ê adj. toll
groovy Ê adj. in Ordnung

Alright, some problems to solve with characterset on the macosx console, but hei, FETZIG!!!

No comments: