Ethical Scraping

We all 'extract' web data.
That's the WHOLE point of the internet!

Some of us specialise, without even knowing it.
In our little corner of the internet you will find Data Scientists, Marketers, Businesses, Analysts, Hackers, Researchers, Search Engines, Marketing Companies, the data curious, and so many more. Everyone really!

Throughout our development we have always paid heavy regard to what is ethically reasonable and what is not. The law in regards to scraping web data is complicated and will one day see reforms but that is probably a very long time away. It’s not that no one is thinking, or writing, about the ethics of scraping, or harvesting, but rather that both those scraping and those being scraped can’t agree on basic principles. It’s a well trodden path or arguments and precedents and confusion.

Being in the industry so long we have been on both sides of this coin. We scrape data mostly for personal, commercial and above all research projects, but we have employed it as a form of data collection, targeting assets clients assert they have legal rights too.

On the other side, we have always battled with bots when it comes to analytics, and protecting our clients data.

In terms of making the efforts of aisite.ai ethical we adhere to the following principles.

If a target has a public API that provides the data we need we will use it and avoid scraping.
We will always provide a User Agent string that makes our intentions clear and provides a way for you to contact us with any concerns.
We request data at a reasonable rate. We strive to never be detected of characterised a DDoS attack, because we are not.
We will only store the data we absolutely need from our target. For example - If all we need is OpenGraph meta-data, that is all we will harvest.
We will respect any content we do store.
We will NEVER pass off harvested data as our own.
We will always look to create value for your quest to migrate.
If a target has a nofollow, or a robots.txt, we will respect that.
We will ignore orphan pages.

Our Policy (for now) as @ 2023

If it is publicly available, and viewable – a human can see it online – then the data is public domain.

Global Rules

This is by no means comprehensive but shows some examples of organisations and practices that seek to define what is ethical and what is not in the field of data scraping.

You have SOPA on one hand and STOP CENSORSHIP
Then there is technical and non-technical censorship.

SOPA – https://en.wikipedia.org/wiki/Stop_Online_Piracy_Act

on the other side….

Here is the whole history of internet censorship better than we could ever write it.

https://en.wikipedia.org/wiki/Internet_censorship

Ethical Scraping

Our Policy (for now) as @ 2023

Migrate with confidence

Migrations

Data

Company