Selenium Webdriver scraping, stale element. I am writing a small scraping program, it navigates to a page with a list of links, It clicks on the first link, opens a new page, gets some details, then navigates back to the page with the list of links, it then tries to find the next link, but i get: org.openqa.selenium.StaleElementReferenceException: stale element reference: element is not. And, Running Selenium test cases using the headless Firefox browser. Running Selenium test cases using a headless Edge browser. Running Selenium headless browser tests using HTMLUnitDriver. HtmlUnitDriver is an implementation of Selenium WebDriver based on HtmlUnit, which is a Java-based implementation of a web browser without a GUI.
In this post, I’m using
selenium
to demonstrate how to web scrape a JavaScript enabled page.If you had some experience of using python for web scraping, you probably already heard of
beautifulsoup
and urllib
. By using the following code, we will be able to see the HTML and then use HTML tags to extract the desired elements. However, if the web page embedded with JavaScript, you will notice that some of the HTML elements can’t be seen from beautiful soup, because they are render by the JavaScript. Instead you will only see the <script>
tags, which indicate the JavaScript codes are placed.the desired html elements are rendered from the
<script>
, so an alternative is need to this page.Procedures of Web-Scraping using Selenium
1. Prerequisite
- download the chrome driver from here
- current stable version is 76.0.3809.126
- choose your Operating System (mac/windows/linux)
- extract the webdriver to
CHOME_DRIVER
(e.g../chromedriver
)
2. Launch the Chrome Driver
use
selenium
to launch the a chrome browser, by calling webdriver.Chrome()
.A blank chrome window should pop up.Now, let’s load the page we want to extract.
use
driver.quit()
to close the browser when you are done with testing.![Java Java](/uploads/1/1/7/6/117648804/208107746.png)
3. Parse the Webpage
selenium
provides multiple ways to locate the elements of the HTML. By using Chrome Developer Tools
(Chrome > More tools > Developer tools), we can easily locate the HTML elements. For example, we’re going to extract the link of Details
, so we point the HTML element and copy the Xpath location.In
selenium
, we can call find_elements_by_xpath
to extract all elements that matching the xpath pattern.It’s worth noticing that, the xpath pattern is too specific and only returns the first link instead of all the links. Therefore we need to generalize the xpath pattern, to capture all the links.
Let’s trace back the upper levels of the xpath. Instead of using
tr[1]
to extract the first row, we use *[contains(@role,'row')]
to capture all the rows contains the class role='row'
.Then in each element, we use
td/a
xpath to locate the <a> tags
. Because the number of links is relative big, a tqdm
progress bar is also added to show the progress of extraction.4. save the data
Finally, we can save the links to a csv for later usage.
In this post we share with you how to perform web scraping of a JS-rendered website. The tools as seen in the header are JAVA with Selenium library driving headless Chrome instances (download driver) and JSoup as parser to fetch data of the acquired HTML.
You can view the code in GitHub
ChromeDriver initialization
I have added some arguments to chromeOptions in the code. The driver threw exceptions without them.
Getting an instance of the Document class using JSoup.
![Scraping Scraping](/uploads/1/1/7/6/117648804/554854749.png)
We can get the page in ChromeDriver using the following command:
Main class (ScrapeData class)
Web Scraping Java Selenium Interview
The main work is done in the ScrapeData class, which implements the Runnable interface. Basic actions in the method run:
- visit category pages
- get links to program pages
- scrape data from program pages
- save data in database
The class constructor accepts a link to a site category page and a page number to start from.
Class StudyPortalsData
StudyPortalsData class is for storing single page data.
Class ScrapeStudyPortals
Java Selenium Web Scraping
The ScrapeStudyPortals class and its main method scrapeAllDataJSoup are to retrieve data from the current page.
Class DataBase
Web Scraping Java Selenium Example
The DataBase class is to save data from an instance of StudyPortalsData to the database using the insertStudyPortalsData method. The third-party MySQL Connector / J library was used to connect to the database.
Web Scraping Tutorial
Methods of the DataBase class basically return DataBase.Status Noteplan mac. as a result.
Many of their Status instances are used in the isDataFull (StudyPortalsData studyPortalsData) method, which checks for data in all required fields.
.