Web scraping : an in-depth overview

Competitor Analysis

Jane is a sales rep at Intermarket Inc. She is preparing for the greatest pitch of her career – a new innovation by her company that could change the future of businesses the world over. She needs the prices of her company’s competition who have been building similar solutions. The fastest way she can get prices is if she can use web scraping. 

She has tried to get pricing manually by opening competitor sites but it’s quite consuming. She also tried some web scraping software but it was not beginner friendly. Some software gave her irrelevant results. She needs to prepare for the presentation, not spend a considerable amount of her time scraping for prices. 

She talks to her supervisor Anne about the issue at hand. After a short brainstorming session, they agree that talking to a software development company might be a great option. Some of the questions they have in mind are: 

  • What does web scraping mean?
  • How does it work?
  • Is it legal?

So Anne calls the software company:

Competitor Analysis
Photo by Anna Shvets from Pexels

If you are in the same situation as Jane and Anne, we are going to take you on a journey where we explore web scraping. We hope to answer some of the questions that you have.

What is web scraping?

It can simply be defined as downloading or extracting website data with a view to analyze and use in a project, task or process. It is important to note that this data extraction is an automated process that is done by software. Even if humans are able to extract data from sites too, they would not be able to do it with the speed and efficiency that software can.

It is also referred to as web harvesting, or web data extraction.

In order to understand web scraping a little better, let us differentiate it from a few similar concepts:

Web scraping vs data scraping

Both of these processes involve extracting data. The only difference between data and web scraping is the source of the data. While web scraping involves getting data from the web, data scraping involves extracting user output from a computer program.

Web scraping vs screen scraping

The difference between web scraping and screen scraping is in the types of data extracted. In screen scraping, screen data is extracted from an old or obsolete application (called a legacy application) and transferred to another application. This is usually done for better UI (user experience) since the old application may be incompatible with newer technologies, therefore, may not render well on newer OSes (Operating Systems) or devices.

What is web scraping used for?

There are several ways that you can use the data that you extract from websites. Let’s explore some:

Sentiment analysis

Let’s say that you have introduced a new product or service in your company. Scraping can help you get an overarching feel of how your intended customers feel about it. You can use data from social media platforms, for example to analyze your customers’ views on your product/service.

Lead generation

Lead Generation
Lead generation

Businesses need revenue to keep growing. This is why generating leads and talking to prospective customers is essential. Lead generation can however be very time consuming. An easy way out is outbound lead generation, where the same communication is sent out to a many people who fit your target audience description. This is not quite targeted and usually does not generate high quality leads.

Web scraping comes in to make lead generation more precise and less time consuming. You can generate leads in a matter of minutes. You are also able to narrow down your leads, as you get to choose where to source leads from, resulting in higher quality leads.

Competitor monitoring

You need to stay abreast of what your competition is up to. This way, you are able to see what you are doing right, and opportunities that could take advantage of (since you could be better placed than they are due to your unique set of circumstances for example). 

Web scraping can help you gather data about the products that your competition is offering, their pricing, and their marketing strategy. You can then use it to adjust your pricing, for example, or provide a value added service that could result in more revenue.

How web scraping works

The following image summarizes the web scraping process:

Web Scraping Process
The Web Scraping Process
Image source: https://prowebscraper.com/blog/what-is-web-scraping/

In a nutshell, the process involves sending a scraping request to the website(s) from which you want to extract data. Once the request is approved, the software extracts particular data from the site(s). It could be prices or product reviews for example. The next step is then downloading the data and storing it in the desired format, for example, csv, or excel. The data can then be stored on your computer, or even a database. 

Is web scraping legal?

The question about whether scraping of data on the web is legal or even ethical, is a common one, and is by all means a legit concern. 

In fact, we began a conversation about it on a Facebook group as shown by the screenshot below, just to get a feel of how different people view it:

Is Web Scraping Legal
Is Web Scraping Legal/Ethical

As we garnered from the conversation, and from our experience with our clients, it is important to put into consideration whether a particular site has stated that they do not want their data to be scrapped. Some use a robots.txt file which defines the content that should not be indexed. 

It is ethical to respect such restrictions. 

It is also advisable to seek legal advice as there might be jurisdictions where scraping may be illegal, and chart a way forward.

In conclusion

Whether you are looking to do some competitor analysis, generate leads, or gain insights about how your customers feel about your product, service or brand in general, web scraping is the most effective way to go about extracting this data. 

If you are looking for a reliable software company to help you with your web scraping python project, then look no further. We will build you a solution that will help you get the data that you need with short turnaround time. We also go out of our way to do our work ethically and legally. Contact us today!

Tech trends today: a look at the tech in April 2020 that matters

Why would we need to put together a piece about what’s happening in the world of tech? Well, there might be tech enthusiasts out there who love to know. However, if you are a tech enthusiast, you probably already know what we are about to discuss. 

We therefore thought about that prudent business person who needs to make complex decisions every day. Knowing what the latest tech trends are will help in making more informed decisions. Maybe, it will help in making the discussion between the business owner and IT department easier. Better still, it will inform the digital transformation journey that they need to undertake or fine tune as a business.

So, in this article, we look at tech and trends that made some interesting milestones in the month of April, and that would have an impact in the future. Join in on this journey.

Tech trend #1: Up to 8 people will soon be able to join WhatsApp audio or video calls

Working From Home Sofware
Working from Home Software
Image source: https://bit.ly/2Wq10Rn

The working from home culture, spurred by the unfortunate pandemic that is Covid-19, has seen a spike in the demand for working from home software. By late January, Microsoft announced on a blog penned by their Corporate Vice President for Microsoft 365 that they had seen a 500% increase in meetings, calls and conferences held on the Teams platform, and a 200% increase in the usage of the same Platform on mobile. Absolutely staggering. 

Video conferencing tools have become increasingly important with restricted movements caused by lock downs the world over. This has forced businesses to allow their staff to work remotely. It has also shown businesses the possibility of remote work in the world beyond Covid-19. 

In fact, according to a recent survey by Gartner:

74% of the surveyed companies plan to have at least 5% of their staff who previously worked from the office working remotely, after Covid-19.

Work From Home Stats
Image source: Gartner (April 2020)

It is no wonder that WhatsApp’s group call feature is also receiving updates to help gear up for the video conferencing upsurge, even though WhatsApp at its core is not a work from home software. The initial update in early April was to have 4 participants in a group call as shown by this Tweet:

WhatsApp Call with 4 Participants
WhatsApp Call with 4 Participants Announcement

However, there is a beta version of WhatsApp  on both Android and iOS, 2.20.133 on Google Play and 2.20.50.25 on iOS (from TestFlight) that is allowing for up to 8 participants. All the participants need to be using this beta version though.

This update provides more communication options for teams and companies working remotely, and could never be more timely. It is also a great option, as WhatsApp calls also use end to end encryption, which keeps them private.  

Tech trend #2: Over 22 million cybersecurity attacks targeted at businesses

This tech trend is undoubtedly worrying. According to IBM’s ‘The Cost of a Data Breach’ 2019 report, the average cost of a data breach is $3.29 million. 

The average cost of a data breach
The average cost of a data breach

The Covid-19 pandemic has seen an increase in cybersecurity attacks, with this article by the Wall Street Journal even stating that the effects of these data breaches may even be long term. 

What this means is that businesses have to prioritize the security of their systems and processes, as well as all the data involved. According to the McKinsey’s ‘Perspectives on Transforming Cybersecurity’ publication, some approaches that businesses can take include:

  • Building a holistic program to protect the business that goes beyond technical controls. Such a program will take into consideration organizational structure, digital assets, business strategy, the role of the human in cybersecurity risks, and use of threat analytics.
  • Involving all stakeholders in making decisions about cyber security. This allows for threats to be understood from different perspectives. When measures and policies are put into place, people in the organization are likely to accept them and support them.

Tech trend #3: Slack retires Foundry

Foundry was a ‘bot to help you with the basics of using Slack’. So, Slack is a communication platform for teams to keep in touch (in case you have not heard of it). Like all communication tools, Slack now stands at 12.5 million users, and boasts an engagement of 90 minutes a day. This outranks Facebook, Instagram and YouTube, at 58, 53, and 40 minutes respectively.

Slack had launched Foundry to make it easier for users to navigate their platform. However, they retired it ‘a while ago’ as per their Tweet:

Slack Retires Foundry
Slack Retires Foundry: Twitter Conversation

Not much was spoken about the company’s decision, or the factors that may have contributed to it. Of interest however, is the timing of the decision. 

Bots are being used to address challenges around customer support and the speed at which they can access information (which is a major frustration with online experiences) as shown in the two images below:

Challenges with online customer experiences
Challenges with online customer experiences
Response Time Expectations from Customers
Response Time Expectations from Customers

So, why would Slack stop using a bot to help users navigate their product when bots seem to be gaining popularity among consumers? Like we said, we do not know for sure. What we do know that there is more to the equation than having bots coming in for improved customer experiences online. 

Bots or virtual assistants can answer 80% of incoming questions without assistance. In Slack’s Foundry, users agreed that the bot could make it easier for users to navigate the product as shown in the image below:

Slacks Foundry
Slacks Foundry

But the truth is, bots have their limitations too. They may not be able to handle complex problems or handle an angry or frustrated customer efficiently, as a human would. 

So, the best case scenario is to have humans and bots working together. Bots can be left to handle simple and direct questions while humans can handle the complex ones, as shown in this MIT report: The Future of Customer Service Is AI-Human Collaboration.

What this means for businesses is that as they plan to automate their online customer experiences, they should strive for the balance between AI and humans.

In conclusion

We have looked at some tech trends that happened in April, but may as well be the tech trends for business. Knowing the above trends will help you begin to move towards making decisions that can future-proof your business. 

Are you looking for ways to automate your business? For example using chatbots? Or looking to  integrate communication tools? Or better still to build your own communication app? We are a team of developers well versed in building custom solutions for businesses. We are sure to take the appropriate security measures too when building our solutions. Contact us for your technology needs.