Extract volume candidate data in 3 easy steps using import.io

The guys at
have just announced an update that allows you to feed in a list of URL’s, then scrape the data associated with them. 

Here is an example of how this works in 3 simple steps that can be applied to most sources of candidate data (including social media sites). 
My source data is a list of URL’s that provides information about the athletes who competed in Aquatics for Team Scotland at the 2014 Commonwealth Games in Glasgow.  
(1) Find a source list of URL’s and extract these into a spreadsheet
I ran 
this search
 in Google, the used the DataMiner Chrome Extension to extract the URL’s into a CSV. You could use import.io to do this, but DataMiner is quicker. 
(NOTE – make sure that the extracted URL’s have http:// proceeding them. If they don’t, quickly add them in or it will cause you a problem later. For a big list of URL’s, you can quickly doing this by using the CONCATENATE function). 
(2) Tell import.io what data you want to scrape from each URL
Open import.io, then select New, then Extractor. Within the import.io browser, navigate to one of the URL’s that you extracted in step 1. Then:-
  • Turn on Extraction. 
  • Name your first column. In this case, I want this to be ‘Athlete Name’. Then select the data from the page that you want to be mapped to your first column. 
  • For each bit of data you wish to extract, select ‘New Column’ then repeat the step above. In this example, I’m going to extract when and where the athlete was born, their chosen sport, and where they currently live.  

  • Once you are happy with this, select DONE. Name your extraction/API (I’m going to call this Scottish Aquatics Athletes) and select PUBLISH. 

(3) Feed the source list of URL’s into import.io and extract

When PUBLISH is selected in step 2, import.io will show the finished API page. Note that you can link the API to Google Sheets from this page, and access the GET and POST URL’s. 
Copy the list of URL’s from the CSV created in step 1 into the Bulk Extract section. Select RUN QUERIES and this will display the data. If you have not linked a Google Sheet to the query, then select OPEN AS DATA SET to download as a spreadsheet.
Video to follow.

Leave a Reply

Your email address will not be published. Required fields are marked *