Home  Puppeteer   How to get ...

How to get web page html source using puppeteer

You can get the complete source HTML of a website using Puppeteer. Puppeteer provides several methods to access and manipulate the content of web pages, including retrieving the page source. Here’s how you can get the HTML source of a website using Puppeteer:

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  // Navigate to a website
  await page.goto('https://example.com');

  // Get the page's HTML source
  const pageSource = await page.content();

  console.log('Page Source:', pageSource);

  await browser.close();
})();

Explanation:

  1. Launch Puppeteer: Start a Puppeteer-controlled browser instance using puppeteer.launch().

  2. Create a New Page: Open a new browser tab/page with browser.newPage().

  3. Navigate to a Website: Use page.goto('https://example.com') to load a specific URL. Replace 'https://example.com' with the URL of the website you want to access.

  4. Retrieve Page Source: Use page.content() to get the full HTML source of the currently loaded page. This method returns a Promise that resolves to the HTML content of the page as a string.

  5. Log or Use the Source: The pageSource variable now contains the entire HTML source code of the page. You can log it to the console, process it further, or save it to a file as needed.

  6. Close the Browser: Always close the browser instance using browser.close() to free up system resources once you've finished using Puppeteer.

Published on: Jun 28, 2024, 12:20 AM  
 

Comments

Add your comment