Ask Your Question
3

Is there any tool which makes it easy to detect the character encoding of a string ?

asked 2014-02-27 20:02:08 -0600

PAC gravatar image

Very often, I've to use files with unknown character encoding and I spent a very long time to try to a long list of possible character encoding. I'd like to know of there is any tool which would make it easy to detect the character encoding of a string.

edit retag flag offensive close merge delete

4 Answers

Sort by ยป oldest newest most voted
3

answered 2014-03-04 05:57:07 -0600

Tony Hirst gravatar image

I think there is a commandline command for that...

file -I {filename}

edit flag offensive delete link more
1

answered 2014-03-04 05:47:46 -0600

Harry Wood gravatar image

updated 2014-03-04 05:48:25 -0600

How to identify UTF-8 encoded strings is a similar question on StackExchange, although mainly talking about programmatic character detection in different languages.

The approach I've used (my answer to that question) is ruby code using the 'chardet' gem

edit flag offensive delete link more
1

answered 2014-03-06 05:23:32 -0600

Rufus Pollock gravatar image

As Tony says you can use file utility. That utility itself is using libmagic - there's a nice python wrapper for this here:

http://filemagic.readthedoc...

>>> with magic.Magic(flags=magic.MAGICMIMEENCODING) as m:
...     m.id_filename('setup.py')
...
'us-ascii'
If you are looking to read tabular data I note that http://github.com/okfn/mess... has "magic" file encoding support built in.

edit flag offensive delete link more
0

answered 2015-03-02 04:31:28 -0600

PAC gravatar image

The chardet library in Python seems to be a good solution : https://chardet.github.io/

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

1 follower

Stats

Asked: 2014-02-27 20:02:08 -0600

Seen: 1,504 times

Last updated: Mar 02 '15