How to get web page html source using puppeteer
You can get the complete source HTML of a website using Puppeteer. Puppeteer provides several methods to access and manipulate the content of web pages, including retrieving the page source. Here’s how you can get the HTML source of a website using Puppeteer:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
// Navigate to a website
await page.goto('https://example.com');
// Get the page's HTML source
const pageSource = await page.content();
console.log('Page Source:', pageSource);
await browser.close();
})();
Explanation:
-
Launch Puppeteer: Start a Puppeteer-controlled browser instance using
puppeteer.launch()
. -
Create a New Page: Open a new browser tab/page with
browser.newPage()
. -
Navigate to a Website: Use
page.goto('https://example.com')
to load a specific URL. Replace'https://example.com'
with the URL of the website you want to access. -
Retrieve Page Source: Use
page.content()
to get the full HTML source of the currently loaded page. This method returns a Promise that resolves to the HTML content of the page as a string. -
Log or Use the Source: The
pageSource
variable now contains the entire HTML source code of the page. You can log it to the console, process it further, or save it to a file as needed. -
Close the Browser: Always close the browser instance using
browser.close()
to free up system resources once you've finished using Puppeteer.