A North American client engaged in providing an online directory portal.A company from USA who runs an online directory portal. They were working on some unique enhancements for their directory portal to make it stand-out. Their main concern was to get the data for the directory. The portal is meant to list all the business directories in different regions and get the maximum profiles published within a limited period of time.
Client required some unique enhancements for their business portal to make the portal stand out in the market. The main challenge was to get the data for their directory, where all information that were available in different regions and get the maximum profile published within a limited time period.
The data’s were scattered in many yellow pages site, these data’s needed to be pulled in to the directory portal , mainly the business name, address, geo location, description, images etc.
Time was a major concern for our client; they found it difficult to populate business listing from all over the internet. The normal registration procedure that picks and enrols the business into the system were not efficient the cost for making and marketing for the portal were sky high
The result was surprising to the client. The process which was initially estimated for one month got completed with in 2weeks. We scrapped almost 99% of data from all those 10sites. Insertion is done in such a way that it directly inserts each of those details into client’s beta site.
ECIT started working in all the 10 picked sites simultaneously. The approx calculation was to populate ten hundred thousand entries in a month. We purchased a server and start scripting the automation process. We faced some challenges in the start, like, to exclude all the duplicate entries feeding into the DB. We created triggers to identify and remove all the unwanted listings and restrict any future updates like that. We started running the actual scrapping process within 8 days. The scrapping process being a slow one which dependent on the site speed which we are scrapping, we created 50 browser sub-process that runs in parallel for each site.
17 Lakh Business profiles scrapped
Profiles populated with in 2 weeks of time
Restricted insertion of all genuine business profiles