Home  Puppeteer   How to know ...

How to know what resources page has downloaded using puppeteer

To retrieve the resources that a page has downloaded behind the scenes, such as CSS, JavaScript files, images, etc., Puppeteer provides a way to capture network requests and responses. This allows you to inspect what resources are being loaded by the page. Here’s how you can achieve this using Puppeteer:

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  // Enable request interception
  await page.setRequestInterception(true);

  // Array to store captured resources
  const resources = [];

  // Event listener for request interception
  page.on('request', interceptedRequest => {
    resources.push({
      url: interceptedRequest.url(),
      type: interceptedRequest.resourceType(),
    });
    interceptedRequest.continue();
  });

  // Navigate to a website
  await page.goto('https://example.com');

  // Wait for a few seconds to capture network requests
  await page.waitForTimeout(5000); // Adjust as needed

  // Display captured resources
  console.log('Captured Resources:', resources);

  await browser.close();
})();

Explanation:

  1. Launch Puppeteer: Start a Puppeteer-controlled browser instance using puppeteer.launch().

  2. Create a New Page: Open a new browser tab/page with browser.newPage().

  3. Enable Request Interception: Use page.setRequestInterception(true) to intercept all network requests made by the page.

  4. Capture Resources: Use page.on('request', ...) to listen for intercepted requests. In the callback function, push details of each request (url and resourceType) into the resources array.

  5. Navigate to a Website: Use page.goto('https://example.com') to load a specific URL. Replace 'https://example.com' with the URL of the website you want to access.

  6. Wait for Requests: Use page.waitForTimeout() (or other waiting strategies) to ensure sufficient time for network requests to be captured.

  7. Display Captured Resources: Log or process the resources array, which now contains information about all resources (CSS, JS, images, etc.) that the page has requested and downloaded.

  8. Close the Browser: Always close the browser instance using browser.close() to free up system resources once you've finished using Puppeteer.

Notes:

Published on: Jun 28, 2024, 12:19 AM  
 

Comments

Add your comment