Webgrph
Site structure mapping, crawl, and hierarchy
1 min read
Webgrph maps the full structure of a website using artifact chaining.
crawl
result = client.webgrph.crawl(
"https://example.com",
max_depth=3,
max_pages=500,
artifact_id=page.artifact_id,
)
print(result.total_pages, result.job_id)
const result = await client.webgrph.crawl('https://example.com', {
maxDepth: 3,
artifactId: page.artifactId,
});
crawl_all (auto-pagination)
for p in client.webgrph.crawl_all("https://example.com", max_depth=3):
print(p.url, p.depth)
crawlStream (SSE)
for await (const event of client.webgrph.crawlStream('https://example.com')) {
console.log(event.type, event.url, event.depth);
}
get_hierarchy
tree = client.webgrph.get_hierarchy("https://example.com")
print(tree.root.url, len(tree.root.children))
PageNode fields
| Field | Type | Description | |-------|------|-------------| | url | string | Page URL | | depth | integer | Depth from root | | status_code | integer | HTTP status | | title | string or null | Page title | | children | PageNode list | Child pages |
Was this page helpful?