My Java program (or rather, a part of it) sends a request to a webservice and receives rdf-strings including ancient Greek words in unicode. I wrote the program in netbeans and so far, there has not been a problem during run-time, both in the netbeans environment and outside as a standalone jar under Linux and Windows XP. Now, all of a sudden the Greek words in the rdf come back garbled like this:
á¼€
At first, I thought this was a Windows XP problem, but when checking under Windows 7 the problem persisted. I found out that I was running OpenJDK under Linux, and was since able to reproduce the issue using Oracle Java. This is the relevant code (of course, I may have tunnel vision, so please tell me if you need more):
try {
HttpClient client = new DefaultHttpClient();
HttpGet get;
get = new HttpGet(URL+URLEncoder.encode(form, "UTF-8"));
HttpResponse response = client.execute(get);
if (201 == response.getStatusLine().getStatusCode()) {
HttpEntity respEnt = response.getEntity();
BufferedReader reader = new BufferedReader(new InputStreamReader(respEnt.getContent()));
StringBuilder sb = new StringBuilder();
char[] cbuffer = new char[256];
int read;
while ((read = reader.read(cbuffer)) != -1) {
sb.append(cbuffer,0,read);
}
//System.out.println(sb.toString());
rdf = new String(sb.toString().getBytes("UTF-8"),"UTF-8");
} else {
System.err.println("HTTP Request fehlgeschlagen.");
}
} catch (IOException e) {
System.err.println("Problem beim HTTP Request.");
}
The webservice is the Perseus morphology service, it can be found here: http://services.perseids.org/bsp/morphologyservice/analysis/word?lang=grc&engine=morpheusgrc&word=. Try "word=μῆνιν", for example. How or when the rdf is generated, I really don't know.
I would be very grateful for further insights!
BufferedInputStreamand the [read()](docs.oracle.com/javase/7/docs/api/java/io/…, int, int)) to read into a byte array, which you can then print to see exactly what you're getting. I'm not really sure whether what you get back will change depending on the machine you get a response from, but this way you can at least be sure that if the message you receive is the same, then it's not the server doing something wonky. I wouldn't call it a necessary step though.