How to find the API of a website to scrape its content? (Scraping Trustpilot)

Creation date

My Role

Category

Growth

Introduction

In this tutorial, we will dive deep into the various methods available to scrape content from websites. Whether you are looking to create a full-fledged copy of a website, or simply need to collect data from a website to use in a specific project, this tutorial will provide you with the tools and techniques necessary to access and scrape content. We will look at how to use APIs from popular websites such as LinkedIn, event websites, Welcome to the Jungle, and many others, to access and scrape content.

We will also look into the more advanced options, such as using web scraping frameworks and APIs from third-party providers, to collect data from complex websites and multi-page structures.

In addition, we will also explore the security and privacy measures that you need to consider when scraping data from websites.

🎯 By the end of this tutorial, you will have a solid understanding of how to use APIs and web scraping frameworks to scrape content from websites.

Main

Let's start by finding a website to scrape for data. https://fr.trustpilot.com/ is an ideal choice. It offers a good balance between complexity and simplicity, plus a secure level and a large database. It's also highly reliable, with reviews from both users and businesses, making it a great resource for data mining.

Trustpilot is also a great way to keep track of customer feedback. This helps businesses stay informed about their product or service and ensure they are providing a positive customer experience. All in all, Trustpilot is an excellent example for anyone looking to gather data from a website.

Define objective

The first step is to define what type of data you want to extract from the website.

  • Company information?

  • Number of reviews?

  • Content of reviews?

  • etc…

It all depends on what you want to scrape, so you'll have to do the rest according to this parameter.

Let's take company details as an example. The objective is clear: find company details for one (or more, if possible) company.

Crawl website

This part is to find an existing API on the website, in a lot of websites there is no API called from the website but ll the data is stored directly on this one.

To crawl website it’s quite simple:

  • Go to a website page

  • Open Inspect element

  • Click on Fetch/XHR

  • Click on a request → for example fr-fr.json

  • When we have .json or data in the name of the request, this is a good sign 🚀

💡 Good tips, select Preserve logs → when you change the page, logs are stored in the tool.

Here, we can observe that we have a plethora of APIs:

It's great news! We need to make sure the APIs are usable. As an example, in the "Response" tab, I see DomainData.


It's good news when we have this type of information in the endpoint.

Lets continue to crawl. 🚀

Category page

To go to the category page, go to the Categories tab and click on the desired category. This will take you to the relevant page.

We have found nothing on this page. We will need to keep looking through the website for an API that can help us scrape the data we need.

No api endpoint usable for the specific page, continue to crawl the website.

💡 You can use the search on the left, looking for the company name, website or information you see.

I clicked "View All" on the right side:

It's helpful to see all the information about the company. Perhaps it will help us find the endpoints? 🫣

I can see now an endpoint of Trustpilot called: /transparency.json?businessUnit=


This endpoint is interesting, we have all informations of the company in the result :

"legalSiteUrl": "https://fr.legal.trustpilot.com",
"businessUnit": {
"basiclinkRate": 0,
"claimedDate": "2014-05-23T07:41:11.000Z",
"displayName": "Younited Credit",
"isUsingPaidFeatures": true,
"hasSubscription": true,
"hasTransparencyData": true,
"id": "500990020000640005186711",
"identifyingName": "www.younited-credit.com",
"isAskingForReviews": true,
"isClaimed": true,
"numberOfReviews": 60624,
"verification": {
"verifiedByGoogle": false,
"verifiedBusiness": true,
"verifiedPaymentMethod": true,
"verifiedUserIdentity": false}}

After scouring through the website, an API endpoint was eventually discovered that provides a wealth of information about a company, such as its legal site URL, business unit, claimed date, display name, contact information, and more.

Moreover, this endpoint also offers a wide array of other useful resources, such as a description of the company's mission and core values, as well as links to reviews, ratings, and other external sources of information, all of which can be used to get a better understanding of the company and its operations.

How to send to Postman to improve readability?

Postman is a popular tool used to make API development easier. It is a powerful and easy-to-use platform used to test and develop APIs. It helps developers to easily create, share, test, and document APIs. Postman also helps developers to quickly and easily debug and troubleshoot their APIs.

I use Postman to test my api endpoints because in some cases, in the browser the endpoints may be displayed incorrectly and the output will not be displayed.

→ So using Postman allows to have the complete output and to iterate on the query.

Use Postman to Test

To begin install & create an account in Postman, Postman is a web-based API testing tool that can be used to test the Linkedin API.

To use Postman to test the Linkedin API, you will need to create a request and add their Linkedin cookie and company ID to the request.

To use Postman to test the Linkedin API, install and create an account on Postman. Postman is a powerful and easy-to-use web-based API testing tool that helps developers by providing an intuitive interface to quickly set up, execute and analyze their API tests.

  • To begin, copy the CURL from the API request.

  • Then click on Import


  • Click in Raw text and paste all CURL that you copied juste before.


  • Then click in Continue & Import

Now you can now send the query to try and check the result.

The response should be displayed in the response window. The response should include the data requested, such as company details, number of reviews, content of reviews, and more.

This data can then be further analyzed for insights or used in other applications to create a more comprehensive view of the company.

Nexts steps? 💡

With the company information you have, you can try changing the business units to another domain and see the difference. This should work.

However, without the business units, you won't get very far. To find them, check the sitemap. It contains a lot of information on all companies.

Scrape them all and see what comes out. Then, based on the categories, extract the information you need.

It is also possible to send the data to n8n to automate the process with the HTTP node and a database such as MongoDB to store all the information of the companies.

💡 Tips for n8n: You can import CURL directly from your developer tool in Chrome. And one thing when it doesn't want to work: remove query parameters and fill directly in the URL parameters.

This will allow you to come and query the data whenever you need to retrieve all the information of a company. The data can also be used to gain a better understanding of the company and its operations.

Conclusion

At the end of this tutorial, you have acquired a comprehensive knowledge regarding the different approaches available to find and use APIs to scrape content from websites. We have explored various techniques and examined how to use APIs to access and scrape data from well-known websites such as LinkedIn, event websites, and Welcome to the Jungle, as well as more advanced possibilities like web scraping frameworks and APIs from third-party providers.

It is also important to consider the security and privacy measures that should be taken into account while scraping data from websites. Ultimately, we have studied how to use Postman to test the API and extract the data.

By following these steps and the advice given throughout this tutorial, you should now have the necessary tools and techniques at your disposal to access and scrape content from websites. With the knowledge you have acquired, you can now confidently approach API scraping tasks with confidence, and have the ability to access and extract data from websites quickly and efficiently.

How to find the API of a website to scrape its content? (Scraping Trustpilot)

Creation date

My Role

Category

Growth

Introduction

In this tutorial, we will dive deep into the various methods available to scrape content from websites. Whether you are looking to create a full-fledged copy of a website, or simply need to collect data from a website to use in a specific project, this tutorial will provide you with the tools and techniques necessary to access and scrape content. We will look at how to use APIs from popular websites such as LinkedIn, event websites, Welcome to the Jungle, and many others, to access and scrape content.

We will also look into the more advanced options, such as using web scraping frameworks and APIs from third-party providers, to collect data from complex websites and multi-page structures.

In addition, we will also explore the security and privacy measures that you need to consider when scraping data from websites.

🎯 By the end of this tutorial, you will have a solid understanding of how to use APIs and web scraping frameworks to scrape content from websites.

Main

Let's start by finding a website to scrape for data. https://fr.trustpilot.com/ is an ideal choice. It offers a good balance between complexity and simplicity, plus a secure level and a large database. It's also highly reliable, with reviews from both users and businesses, making it a great resource for data mining.

Trustpilot is also a great way to keep track of customer feedback. This helps businesses stay informed about their product or service and ensure they are providing a positive customer experience. All in all, Trustpilot is an excellent example for anyone looking to gather data from a website.

Define objective

The first step is to define what type of data you want to extract from the website.

  • Company information?

  • Number of reviews?

  • Content of reviews?

  • etc…

It all depends on what you want to scrape, so you'll have to do the rest according to this parameter.

Let's take company details as an example. The objective is clear: find company details for one (or more, if possible) company.

Crawl website

This part is to find an existing API on the website, in a lot of websites there is no API called from the website but ll the data is stored directly on this one.

To crawl website it’s quite simple:

  • Go to a website page

  • Open Inspect element

  • Click on Fetch/XHR

  • Click on a request → for example fr-fr.json

  • When we have .json or data in the name of the request, this is a good sign 🚀

💡 Good tips, select Preserve logs → when you change the page, logs are stored in the tool.

Here, we can observe that we have a plethora of APIs:

It's great news! We need to make sure the APIs are usable. As an example, in the "Response" tab, I see DomainData.


It's good news when we have this type of information in the endpoint.

Lets continue to crawl. 🚀

Category page

To go to the category page, go to the Categories tab and click on the desired category. This will take you to the relevant page.

We have found nothing on this page. We will need to keep looking through the website for an API that can help us scrape the data we need.

No api endpoint usable for the specific page, continue to crawl the website.

💡 You can use the search on the left, looking for the company name, website or information you see.

I clicked "View All" on the right side:

It's helpful to see all the information about the company. Perhaps it will help us find the endpoints? 🫣

I can see now an endpoint of Trustpilot called: /transparency.json?businessUnit=


This endpoint is interesting, we have all informations of the company in the result :

"legalSiteUrl": "https://fr.legal.trustpilot.com",
"businessUnit": {
"basiclinkRate": 0,
"claimedDate": "2014-05-23T07:41:11.000Z",
"displayName": "Younited Credit",
"isUsingPaidFeatures": true,
"hasSubscription": true,
"hasTransparencyData": true,
"id": "500990020000640005186711",
"identifyingName": "www.younited-credit.com",
"isAskingForReviews": true,
"isClaimed": true,
"numberOfReviews": 60624,
"verification": {
"verifiedByGoogle": false,
"verifiedBusiness": true,
"verifiedPaymentMethod": true,
"verifiedUserIdentity": false}}

After scouring through the website, an API endpoint was eventually discovered that provides a wealth of information about a company, such as its legal site URL, business unit, claimed date, display name, contact information, and more.

Moreover, this endpoint also offers a wide array of other useful resources, such as a description of the company's mission and core values, as well as links to reviews, ratings, and other external sources of information, all of which can be used to get a better understanding of the company and its operations.

How to send to Postman to improve readability?

Postman is a popular tool used to make API development easier. It is a powerful and easy-to-use platform used to test and develop APIs. It helps developers to easily create, share, test, and document APIs. Postman also helps developers to quickly and easily debug and troubleshoot their APIs.

I use Postman to test my api endpoints because in some cases, in the browser the endpoints may be displayed incorrectly and the output will not be displayed.

→ So using Postman allows to have the complete output and to iterate on the query.

Use Postman to Test

To begin install & create an account in Postman, Postman is a web-based API testing tool that can be used to test the Linkedin API.

To use Postman to test the Linkedin API, you will need to create a request and add their Linkedin cookie and company ID to the request.

To use Postman to test the Linkedin API, install and create an account on Postman. Postman is a powerful and easy-to-use web-based API testing tool that helps developers by providing an intuitive interface to quickly set up, execute and analyze their API tests.

  • To begin, copy the CURL from the API request.

  • Then click on Import


  • Click in Raw text and paste all CURL that you copied juste before.


  • Then click in Continue & Import

Now you can now send the query to try and check the result.

The response should be displayed in the response window. The response should include the data requested, such as company details, number of reviews, content of reviews, and more.

This data can then be further analyzed for insights or used in other applications to create a more comprehensive view of the company.

Nexts steps? 💡

With the company information you have, you can try changing the business units to another domain and see the difference. This should work.

However, without the business units, you won't get very far. To find them, check the sitemap. It contains a lot of information on all companies.

Scrape them all and see what comes out. Then, based on the categories, extract the information you need.

It is also possible to send the data to n8n to automate the process with the HTTP node and a database such as MongoDB to store all the information of the companies.

💡 Tips for n8n: You can import CURL directly from your developer tool in Chrome. And one thing when it doesn't want to work: remove query parameters and fill directly in the URL parameters.

This will allow you to come and query the data whenever you need to retrieve all the information of a company. The data can also be used to gain a better understanding of the company and its operations.

Conclusion

At the end of this tutorial, you have acquired a comprehensive knowledge regarding the different approaches available to find and use APIs to scrape content from websites. We have explored various techniques and examined how to use APIs to access and scrape data from well-known websites such as LinkedIn, event websites, and Welcome to the Jungle, as well as more advanced possibilities like web scraping frameworks and APIs from third-party providers.

It is also important to consider the security and privacy measures that should be taken into account while scraping data from websites. Ultimately, we have studied how to use Postman to test the API and extract the data.

By following these steps and the advice given throughout this tutorial, you should now have the necessary tools and techniques at your disposal to access and scrape content from websites. With the knowledge you have acquired, you can now confidently approach API scraping tasks with confidence, and have the ability to access and extract data from websites quickly and efficiently.

How to find the API of a website to scrape its content? (Scraping Trustpilot)

Creation date

My Role

Category

Growth

Introduction

In this tutorial, we will dive deep into the various methods available to scrape content from websites. Whether you are looking to create a full-fledged copy of a website, or simply need to collect data from a website to use in a specific project, this tutorial will provide you with the tools and techniques necessary to access and scrape content. We will look at how to use APIs from popular websites such as LinkedIn, event websites, Welcome to the Jungle, and many others, to access and scrape content.

We will also look into the more advanced options, such as using web scraping frameworks and APIs from third-party providers, to collect data from complex websites and multi-page structures.

In addition, we will also explore the security and privacy measures that you need to consider when scraping data from websites.

🎯 By the end of this tutorial, you will have a solid understanding of how to use APIs and web scraping frameworks to scrape content from websites.

Main

Let's start by finding a website to scrape for data. https://fr.trustpilot.com/ is an ideal choice. It offers a good balance between complexity and simplicity, plus a secure level and a large database. It's also highly reliable, with reviews from both users and businesses, making it a great resource for data mining.

Trustpilot is also a great way to keep track of customer feedback. This helps businesses stay informed about their product or service and ensure they are providing a positive customer experience. All in all, Trustpilot is an excellent example for anyone looking to gather data from a website.

Define objective

The first step is to define what type of data you want to extract from the website.

  • Company information?

  • Number of reviews?

  • Content of reviews?

  • etc…

It all depends on what you want to scrape, so you'll have to do the rest according to this parameter.

Let's take company details as an example. The objective is clear: find company details for one (or more, if possible) company.

Crawl website

This part is to find an existing API on the website, in a lot of websites there is no API called from the website but ll the data is stored directly on this one.

To crawl website it’s quite simple:

  • Go to a website page

  • Open Inspect element

  • Click on Fetch/XHR

  • Click on a request → for example fr-fr.json

  • When we have .json or data in the name of the request, this is a good sign 🚀

💡 Good tips, select Preserve logs → when you change the page, logs are stored in the tool.

Here, we can observe that we have a plethora of APIs:

It's great news! We need to make sure the APIs are usable. As an example, in the "Response" tab, I see DomainData.


It's good news when we have this type of information in the endpoint.

Lets continue to crawl. 🚀

Category page

To go to the category page, go to the Categories tab and click on the desired category. This will take you to the relevant page.

We have found nothing on this page. We will need to keep looking through the website for an API that can help us scrape the data we need.

No api endpoint usable for the specific page, continue to crawl the website.

💡 You can use the search on the left, looking for the company name, website or information you see.

I clicked "View All" on the right side:

It's helpful to see all the information about the company. Perhaps it will help us find the endpoints? 🫣

I can see now an endpoint of Trustpilot called: /transparency.json?businessUnit=


This endpoint is interesting, we have all informations of the company in the result :

"legalSiteUrl": "https://fr.legal.trustpilot.com",
"businessUnit": {
"basiclinkRate": 0,
"claimedDate": "2014-05-23T07:41:11.000Z",
"displayName": "Younited Credit",
"isUsingPaidFeatures": true,
"hasSubscription": true,
"hasTransparencyData": true,
"id": "500990020000640005186711",
"identifyingName": "www.younited-credit.com",
"isAskingForReviews": true,
"isClaimed": true,
"numberOfReviews": 60624,
"verification": {
"verifiedByGoogle": false,
"verifiedBusiness": true,
"verifiedPaymentMethod": true,
"verifiedUserIdentity": false}}

After scouring through the website, an API endpoint was eventually discovered that provides a wealth of information about a company, such as its legal site URL, business unit, claimed date, display name, contact information, and more.

Moreover, this endpoint also offers a wide array of other useful resources, such as a description of the company's mission and core values, as well as links to reviews, ratings, and other external sources of information, all of which can be used to get a better understanding of the company and its operations.

How to send to Postman to improve readability?

Postman is a popular tool used to make API development easier. It is a powerful and easy-to-use platform used to test and develop APIs. It helps developers to easily create, share, test, and document APIs. Postman also helps developers to quickly and easily debug and troubleshoot their APIs.

I use Postman to test my api endpoints because in some cases, in the browser the endpoints may be displayed incorrectly and the output will not be displayed.

→ So using Postman allows to have the complete output and to iterate on the query.

Use Postman to Test

To begin install & create an account in Postman, Postman is a web-based API testing tool that can be used to test the Linkedin API.

To use Postman to test the Linkedin API, you will need to create a request and add their Linkedin cookie and company ID to the request.

To use Postman to test the Linkedin API, install and create an account on Postman. Postman is a powerful and easy-to-use web-based API testing tool that helps developers by providing an intuitive interface to quickly set up, execute and analyze their API tests.

  • To begin, copy the CURL from the API request.

  • Then click on Import


  • Click in Raw text and paste all CURL that you copied juste before.


  • Then click in Continue & Import

Now you can now send the query to try and check the result.

The response should be displayed in the response window. The response should include the data requested, such as company details, number of reviews, content of reviews, and more.

This data can then be further analyzed for insights or used in other applications to create a more comprehensive view of the company.

Nexts steps? 💡

With the company information you have, you can try changing the business units to another domain and see the difference. This should work.

However, without the business units, you won't get very far. To find them, check the sitemap. It contains a lot of information on all companies.

Scrape them all and see what comes out. Then, based on the categories, extract the information you need.

It is also possible to send the data to n8n to automate the process with the HTTP node and a database such as MongoDB to store all the information of the companies.

💡 Tips for n8n: You can import CURL directly from your developer tool in Chrome. And one thing when it doesn't want to work: remove query parameters and fill directly in the URL parameters.

This will allow you to come and query the data whenever you need to retrieve all the information of a company. The data can also be used to gain a better understanding of the company and its operations.

Conclusion

At the end of this tutorial, you have acquired a comprehensive knowledge regarding the different approaches available to find and use APIs to scrape content from websites. We have explored various techniques and examined how to use APIs to access and scrape data from well-known websites such as LinkedIn, event websites, and Welcome to the Jungle, as well as more advanced possibilities like web scraping frameworks and APIs from third-party providers.

It is also important to consider the security and privacy measures that should be taken into account while scraping data from websites. Ultimately, we have studied how to use Postman to test the API and extract the data.

By following these steps and the advice given throughout this tutorial, you should now have the necessary tools and techniques at your disposal to access and scrape content from websites. With the knowledge you have acquired, you can now confidently approach API scraping tasks with confidence, and have the ability to access and extract data from websites quickly and efficiently.

© 2023 Telmo Crespo

© 2023 Telmo Crespo

© 2023 Telmo Crespo