JMeter CSV Data Set is corrupting Japanese strings stored as proper UTF-8, I get Question Marks instead
- by Mark Bennett
I read in search terms from a simple text file to send to a search engine.
It works fine in English, but gives me ???? for any Japanese text.
Text with mixed English and Japanese does show the English text, so I know it's reading it.
What I'm seeing:
Input text:
Snow Leopard ???????????????
Turns into:
Snow Leopard ???????????????
This is in my POST field of an HTTP.
If I set JMeter to encode the data, it just puts in the percent sequence for question marks.
Interesting note:
In the example above there are 15 Japanese characters, and then 15 question marks, so at some point it's being seen as full characters and not just bytes.
About the Data:
The CSV file is very simple in structure.
There's only one field / one column, which I name TERM, and later use as ${TERM}
I don't really need full CSV because it's only one string per line.
There's no commas or quotes.
When I run the Unix "file" command on the file, it says UTF-8 text.
I've also verified it in command line and graphical mode on two machines.
JMeter CSV Dataset Config:
Filename: japanese-searches.csv
File encoding: UTF-8 (also tried without)
Variable names: TERM
Delimiter: ,
Allow Quoted Data: False (I also tried True, different, but still wrong)
Recycle at EOF: True
Stop at EOF: False
Staring mode: All threads
A few things I've tried:
Tried Allow quoted Data. It changed to other strange characters.
-Dfile.encoding=UTF-8
Tried encoding the POST, but it just turned into a bunch of %nn for question marks
And I'm not sure how "debug" just after the each line of the CSV is read in. I think it's corrupted right away, but I'm not sure.
If it's only mangled when I reference it, then instead of ${TERM} perhaps there's some other "to bytes" function call. I'll start checking into that. I haven't done anything with the JMeter functions yet.