DevBolt
Processed in your browser. Your data never leaves your device.

XPath for Web Scraping — Practical Guide

XPath is one of the most powerful tools for web scraping because it can select elements by text content, navigate up and down the DOM tree, and handle complex page structures that CSS selectors cannot express.

← Back to tools

XPath Tester

Test XPath expressions against XML data with real-time evaluation. Extract elements, filter by attributes, and navigate XML document structures.

XPath Reference
ExpressionDescription
/Root element
//elementAll matching elements anywhere
./childDirect child of context
@attrAttribute value
[1]First element (1-indexed)
[last()]Last element
[position()<3]First two elements
[@attr='val']Filter by attribute
[contains(., 'text')]Contains text
[starts-with(@id, 'x')]Starts with
text()Text content
node()Any node
count(//el)Count elements
sum(//el)Sum of numeric values
string-length(//el)String length
ancestor::elAncestor axis
descendant::elDescendant axis
following-sibling::elFollowing siblings
parent::elParent axis
el1 | el2Union of two node sets
About XPath

XPath (XML Path Language) is a query language for selecting nodes from XML documents. It uses path expressions to navigate through elements, attributes, and text in an XML tree structure.

Key concepts:
  • Nodes — elements, attributes, text, comments, and the document itself
  • Axes — define the direction of navigation (child, parent, ancestor, descendant, sibling)
  • Predicates — filter nodes with conditions inside square brackets
  • Functions — built-in string, number, and node functions (contains, count, sum, etc.)

XPath is used in XSLT, XQuery, web scraping (Selenium, Puppeteer), configuration parsing, and XML data extraction. This tool uses your browser's built-in XPath 1.0 engine — no data is sent over the network.

Common XPath patterns for scraping

The most useful XPath patterns for web scraping: //a[contains(@href, '/product/')] matches product links by URL pattern. //div[contains(@class, 'price')]/text() extracts price text. //table//tr[position()>1]/td scrapes table rows skipping the header. //img/@src gets all image URLs. //h2/following-sibling::p[1] gets the first paragraph after each heading. //*[contains(text(), 'Add to Cart')]/ancestor::div[@class] finds the container around an 'Add to Cart' button.

# Selenium (Python)
from selenium.webdriver.common.by import By

driver.find_elements(By.XPATH, "//div[@class='product']")
driver.find_element(By.XPATH, "//button[text()='Submit']")

# Puppeteer (JavaScript)
const elements = await page.$x("//a[contains(@href, '/item/')]")
const texts = await page.$x("//span[@class='price']/text()")

# Scrapy (Python)
response.xpath("//h1/text()").get()
response.xpath("//ul[@class='nav']//a/@href").getall()

Handling dynamic and complex pages

Modern web pages often use dynamic class names (e.g., css-1a2b3c) that change between builds. XPath handles this with partial matching: //*[contains(@class, 'product')] matches any element whose class contains 'product'. For pages with multiple similar sections, use positional predicates: (//div[@class='card'])[3] selects the third card. For shadow DOM or iframe content, you need to switch context first in your scraping tool before applying XPath.

Testing XPath before scraping

Always test your XPath expressions before writing scraping code. You can use this tool by pasting the page's HTML source, or test directly in browser DevTools: open the Console and run document.evaluate("//your/xpath", document, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null). In Chrome DevTools Elements panel, press Ctrl+F and type your XPath expression to highlight matching elements on the page.

Frequently Asked Questions

How do I find the XPath of an element in Chrome DevTools?

Right-click the element on the page, select Inspect, then right-click the highlighted element in the Elements panel and choose Copy → Copy XPath (for an absolute path) or Copy → Copy full XPath. You can also press Ctrl+F in the Elements panel and type an XPath expression to search — Chrome highlights matching elements and shows the count. This is the fastest way to test and refine XPath queries.

How do I handle namespaces in XPath?

XML namespaces can break XPath queries because //element won't match <ns:element>. Solutions: (1) use local-name(): //*[local-name()='element'] ignores the namespace prefix, (2) in code, register a namespace resolver that maps prefixes to URIs, (3) strip namespaces from the XML before querying if you control the input. Most web scraping scenarios use HTML (not XML), so namespaces are rarely an issue.

What is the best XPath strategy for stable scraping?

For resilient scrapers: (1) prefer semantic attributes over generated ones — @id, @name, @role, data-* attributes are more stable than CSS class names, (2) use text content as anchors — //label[text()='Email']/following::input[1] survives layout changes, (3) avoid deep absolute paths like /html/body/div[3]/div[2] because they break when the page structure changes, (4) combine contains() with class fragments for partial matching, (5) test with multiple pages to ensure your XPath works across variations.

Related Inspect Tools