Extracting data from wikipedia tables

asked 2013-06-12 07:44:16 -0500

PAC gravatar image

updated 2013-06-12 07:45:35 -0500

I would like to extract data from a serie of wikipedia tables. I've found a nice add-on to firefox (ExportToCSV) but unfortunately, it doesn't export data with internal links. For instance, if you try to use it with this table :, you will not get the name of the manager and of the captain. Does anyone know a better tool ? I'd like something very easy to use.

4 Answers

answered 2013-06-13 01:27:47 -0500

PAC gravatar image

I've found a solution to my problem : the html2table plugin in Chrome.

answered 2013-06-13 22:39:47 -0500

Andrew Duffy gravatar image

The "Scraper" plugin in Chrome also works on that table:

I'm a big fan of the scraper extension and usually teach it in workshops.. Will have a look at html2table though as suggested by @PAC - advantage of scraper extension: You can even scrape more complex websites.

mihi gravatar imagemihi ( 2013-06-17 07:08:06 -0500 )edit

answered 2013-06-25 04:00:45 -0500

Tony Hirst gravatar image

updated 2013-06-25 04:30:52 -0500

Google Sheets (aka Google Spreadsheets) has a handy formula called =importHtml() that can import a table or HTML list from a web page given its web location/URL:

  • =importHtml(URL, "table", N)
  • =importHtml(URL, "list", M)

where N is the Nth table in the webpage at URL, and M is the Mth list in the page.

You can find an example of how to use this formula here: (Feeding Google Spreadsheets: Exercises in using importHTML, importFeed, importXML, importRange and importData (with some QUERY too))

Unfortunately, it doesn't cope with the links either...

For a "simple" tool that will extract the links into a separate column, try (Outwit hub).

Very nice tool !

PAC gravatar imagePAC ( 2013-06-25 04:20:43 -0500 )edit

answered 2013-06-24 06:27:12 -0500

phillchill gravatar image

also, if you'd like to get into more customizable scraping, check out You can write your own custom scraper in Python that saves results to their database, and them download results as SQLite Database, .csv table or JSON file.

For inspiration, there are quite a number of wikipedia scrapers :

