Web Scraping Specialist
Wynd Labs
$70,000 - $140,000 Yearly
USA
🌎 Remote
Category: EngineeringSubcategory: Data EngineeringType: Full-time
Who We Are:
We build infrastructure that delivers massive amounts of web data to the companies training the world’s most powerful AI models.
We're the team that helps to power and support Grass, a bandwidth-sharing network that lets us operate a massive distributed crawler, giving us unique access to high-quality public web data at global scale. On top of that, we’ve built pipelines for ingesting, segmenting, and annotating billions of videos, transcripts, and audio files, powering dataset creation for frontier labs.
We’re lean, technical, and move fast. No red tape, no slow decision-making; just a team of builders pushing to expand what’s possible for open web data and AI.
The Role.
We are seeking a Web Scraping Specialist who is proficient and brings significant experience in data extraction and web scraping techniques. You will join a small, specialized team and lead efforts to gather and analyze data, optimize scraping processes, and support our vision for a future where Grass plays a crucial role in transforming internet data accessibility.
Who You Are.
-
Demonstrated ability to extract data from complex websites with minimal supervision, with a portfolio or examples of past projects.
-
Proficiency in languages such as Python or JavaScript, with strong skills in libraries and frameworks like BeautifulSoup, Scrapy, or Selenium.
-
Knowledge of asynchronous programming, multithreading, and distributed scraping.
-
In-depth knowledge of HTML, CSS, JavaScript, and the Document Object Model (DOM).
-
Experience with NoSQL databases (MongoDB, Cassandra), capable of designing efficient storage solutions and managing data integrity.
-
Ability to apply machine learning algorithms for data cleaning, categorization, or predictive analysis adds significant value.
-
Experience with cloud services (AWS, Google Cloud, Azure) for deploying and managing scraping jobs at scale.
-
Active participation in open-source projects related to web scraping, data processing, or similar fields.
What You'll Be Doing.
-
Write, test, and refine code that extracts data from various online sources, ensuring reliability and efficiency.
-
Perform data retrieval tasks, handling complexities such as pagination and dynamic content loaded with AJAX.
-
Clean and format extracted data, ensuring it meets quality standards for further analysis or processing.
-
Database management: Store and manage the scraped data in appropriate databases, optimizing for access speed and data integrity.
-
Regularly monitor the scraping processes, identify and resolve any issues to maintain continuous data flow.
Why Work With Us:
-
Opportunity. We are at at the forefront of developing a web-scale crawler and knowledge graph that allows ordinary people to participate in the process, and share in the benefits of AI development.
-
Culture. We’re a lean team working together to achieve a very ambitious goal of improving access to public web data and distributing the value of AI to the people. We prioritize low ego and high output.
-
Work Remotely
-
Compensation. You’ll receive a competitive salary, benefits and equity package.
Tags
Share This Job
Wynd Labs
WebsiteWynd Labs provides large-scale access to public web data for research, analytics, and business intelligence at scale.
Wynd Labs provides extensive access to public web data via a distributed proxy network designed to support research, analytics, and business intelligence. The platform is engineered for scalability and offers competitive pricing.