The guys at
have just announced an update that allows you to feed in a list of URL’s, then scrape the data associated with them.
Here is an example of how this works in 3 simple steps that can be applied to most sources of candidate data (including social media sites).
My source data is a list of URL’s that provides information about the athletes who competed in Aquatics for Team Scotland at the 2014 Commonwealth Games in Glasgow.
(1) Find a source list of URL’s and extract these into a spreadsheet
in Google, the used the DataMiner Chrome Extension to extract the URL’s into a CSV. You could use import.io to do this, but DataMiner is quicker.
(NOTE – make sure that the extracted URL’s have http:// proceeding them. If they don’t, quickly add them in or it will cause you a problem later. For a big list of URL’s, you can quickly doing this by using the CONCATENATE function).
(2) Tell import.io what data you want to scrape from each URL
Open import.io, then select New, then Extractor. Within the import.io browser, navigate to one of the URL’s that you extracted in step 1. Then:-
(3) Feed the source list of URL’s into import.io and extract
When PUBLISH is selected in step 2, import.io will show the finished API page. Note that you can link the API to Google Sheets from this page, and access the GET and POST URL’s.
Copy the list of URL’s from the CSV created in step 1 into the Bulk Extract section. Select RUN QUERIES and this will display the data. If you have not linked a Google Sheet to the query, then select OPEN AS DATA SET to download as a spreadsheet.
Video to follow.