Cursor (https://www.cursor.so/) is a powerful code editor powered by GPT-4 that assists programmers in their daily coding tasks.
The example shown here is web scraping. The target of the scraping is the Baidu Hot Search on the Baidu homepage, as shown in the image.
I implemented it using JavaScript.
By simultaneously pressing Ctrl + K, the prompt given is to scrape the Baidu Hot Search on the website https://www.baidu.com/.
Cursor quickly provides the following code:
// Use axios and cheerio libraries to scrape the Baidu Hot Search on the website https://www.baidu.com/
const axios = require("axios");
const cheerio = require("cheerio");
async function fetchBaiduHotSearch() {
try {
const response = await axios.get("https://www.baidu.com/");
const $ = cheerio.load(response.data);
const hotSearchList = [];
$(".hotsearch-item").each((index, element) => {
const title = $(element).find(".title-content-title").text().trim();
hotSearchList.push(title);
});
console.log(hotSearchList);
} catch (error) {
console.error("Error fetching Baidu Hot Search:", error);
}
}
fetchBaiduHotSearch();
My question is: Can this code run?
GPT-4 explains the running principle of this code and also tells me to install the axios and cheerio libraries using npm.
My question is: Can this code run?
GPT-4 explains the running principle of this code and also tells me to install the axios and cheerio libraries using npm.
After running it, I found that the output result is an empty array.
I provided the error feedback to GPT-4, and it suggested checking the CSS selectors in the webpage structure.
Confirming that there is no problem with the webpage structure, I provided feedback to GPT-4. It further suggested modifying the User-Agent header of the HTTP to bypass Baidu's anti-web scraping measures.
Following the code provided by GPT-4, I made the necessary modifications. After running it, I obtained the correct result.
Below is the final generated code:
const axios = require("axios");
const cheerio = require("cheerio");
async function fetchBaiduHotSearch() {
try {
const response = await axios.get("https://www.baidu.com/", {
headers: {
"User-Agent":
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3",
Referer: "https://www.baidu.com/",
},
});
const $ = cheerio.load(response.data);
const hotSearchList = [];
$(".hotsearch-item").each((index, element) => {
const title = $(element).find(".title-content-title").text().trim();
hotSearchList.push(title);
});
console.log(hotSearchList);
} catch (error) {
console.error("Error fetching Baidu Hot Search:", error);
}
}
fetchBaiduHotSearch();