Content pfp
Content
@
0 reply
0 recast
0 reaction

Kinh Ba pfp
Kinh Ba
@kinh3
Rewiring the Matrix: Grass's Blueprint for AI's Data Democracy Grass is democratizing data in the AI revolution by turning millions of internet users into data miners > This article explore the current challenges in AI data we face and how Grass, as a democratic data layer of AI, works to foster a more equitable and innovative AI landscape
22 replies
0 recast
24 reactions

Emmauel pfp
Emmauel
@vinhtuong
In the AI era, data is the new gold. AI models and algorithms process vast amounts of data to generate insights, make predictions, and solve complex problems. Unlike historical gold rushes, the digital treasure is generated by users but controlled by a handful of tech conglomerates. The concentration of data sources has created a digital divide, leading to data opacity and undervaluing contributors while hindering competition from AI startups and open-source projects.
0 reply
0 recast
8 reactions

Tuổi Trẻ Cười pfp
Tuổi Trẻ Cười
@tuoitrecuoi
Grass is a decentralized network aiming to democratize AI data access by leveraging crowdsourced bandwidth for web scraping.
0 reply
0 recast
8 reactions

Suner pfp
Suner
@bronu
Both the quality and quantity of data significantly impact AI models’ accuracy and effectiveness. Biased or insufficient data can lead to unreliable outcomes, underscoring the importance of robust, accessible data provision for building reliable AI systems.
0 reply
0 recast
7 reactions

Pravas pfp
Pravas
@khet
AI models utilize data as their foundation, learning and improving through the analysis of vast datasets. This process, known as machine learning, involves training models on labeled data, where each data point is paired with a corresponding output. By identifying patterns and relationships within data, models can make predictions or decisions on new, unseen data.
0 reply
0 recast
7 reactions

Youre pfp
Youre
@dtmyxuyenst
This article explores the current challenges in AI data we face and how Grass, as a democratic data layer of AI, works to foster a more equitable and innovative AI landscape.
0 reply
0 recast
7 reactions

85cent pfp
85cent
@salonpas
Currently, large tech conglomerates like Google, Microsoft, and Facebook are dominating the AI ecosystem. These companies have vast resources and built-in advantages to collect and process data for training AI models. While these giants have undeniably propelled AI development and adoption, their concentrated power—particularly over data—has fostered a digital divide. The imbalance severely restricts data access for smaller entities and impedes innovation across the sector. In fact, a select few tech conglomerates effectively control the majority of internet data, capitalizing on user-generated content while simultaneously erecting barriers to competitor access.
0 reply
0 recast
6 reactions

Ca Non pfp
Ca Non
@canon
More significantly, the benefits of AI's growth are largely flowing to those who control the data, leaving the wider public – whose online activities generate the valuable data information – without a stake in the AI revolution.
0 reply
0 recast
5 reactions

Vét Láp pfp
Vét Láp
@vestlab
Data accessibility is another emerging problem, where smaller companies and open-source projects are struggling with limited access to public web data for AI training. More and more large social platforms have begun to charge or raise fees for their API access. For example, X stopped supporting free access to their API in February 2023. Soon after, Reddit announced its intentions to charge fees for its API in April 2023, a feature which had been free since 2008. The exorbitant amount of data cost is becoming a significant barrier for smaller players to engage in AI development.
0 reply
0 recast
5 reactions

Du Tho pfp
Du Tho
@dutho
The giants’ data monopoly has caused a number of concerns in the AI industry so far. For instance, the lack of transparency in AI training data sources is becoming worrying. According to the October 2023 transparency study by Stanford researchers, among general low transparency across AI system development, transparency about data is particularly poor. The opacity surrounding AI training data sources also raises concerns about bias and reliability, while the inability to verify data provenance opens the door to manipulation and misinformation.
0 reply
0 recast
5 reactions

iguverse pfp
iguverse
@iguverse.eth
While Reddit benefits financially, the users whose posts and interactions form the backbone of the valuable dataset see no direct compensation. The current model doesn't provide a way for average internet users to benefit from the value of the data they generate through their online activities.
0 reply
0 recast
3 reactions

Gop pfp
Gop
@guop
A stark example is Google’s recent $60 million deal with Reddit, which allows the search giant to train its AI models on user-generated content from millions of Reddit posts and comments.
0 reply
0 recast
3 reactions

Nguyễn Phương Nhi pfp
Nguyễn Phương Nhi
@beautiverse
In response to these challenges, innovative solutions are emerging to democratize access to web data and remodel the AI data landscape. One such pioneering project is Grass, which aims to address the inadequacy of traditional data accessibility solutions by leveraging the power of decentralized networks.
0 reply
0 recast
2 reactions

Matngot pfp
Matngot
@matngot
Traditionally, web scraping from data centers is a cost efficient method to access large amounts of data. However, as more websites are trying to protect their data, web scraping from data centers can be easily blocked or circumvented. Websites can identify and block IP addresses associated with known data centers. Additionally, data center traffic often shows patterns typical of automated scraping (high volume, consistent timing) and lack the variability and "noise" associated with real user behavior, thus making them more identifiable and vulnerable to being targeted.
0 reply
0 recast
2 reactions

Đà Nẵng pfp
Đà Nẵng
@chetruoi
There are some open source datasets available for AI researchers, such as WikiData and Open Data Initiative, aiming to promote data availability in the space. But these free and open-source knowledge bases face several challenges, including long-term sustainability and scalability. As many open-source datasets rely on grants, donations, or volunteer efforts, it can be inconsistent or insufficient for long-term maintenance. Meanwhile, along with the growth of datasets, the costs and technical challenges of hosting and distributing large volumes of data increase exponentially.
0 reply
0 recast
2 reactions

Cây Thúi pfp
Cây Thúi
@chicken
So, why don't we circumvent the giants and build open source datasets or just scrape directly from open websites? Indeed, some smart brains have already done that, but it seems to be utterly inadequate.
0 reply
0 recast
2 reactions

Vien Tin pfp
Vien Tin
@vientin
A number of people still "simply view the Data Wars as infighting between factions of the Silicon Valley elite with no possible benefit to themselves". The lack of incentives and inequitable distribution will no doubt limit the participation of ordinary people in the AI revolution.
0 reply
0 recast
2 reactions

Gacon pfp
Gacon
@gacon
Notably, Grass users can get point or token rewards for renting their idle bandwidth. On the data consumption side, a wide range of AI startups, developers and researchers can easily access such crowdsourced data in a cost-efficient way.
0 reply
0 recast
1 reaction

Khế Ngọt pfp
Khế Ngọt
@khengot
To put simply, Grass works by harnessing the power of the crowd to create a decentralized web scraping network. Imagine millions of everyday internet users transforming their devices into tiny, efficient data collectors. That's the core innovation of Grass. Users install a node on their internet-connected devices, which then utilizes their excess bandwidth to scrape public web data. These mini data miners create a vast, distributed network that's incredibly difficult for websites to block or detect.
0 reply
0 recast
1 reaction

Lam-May pfp
Lam-May
@lammay
Grass is a decentralized network that allows anyone with an internet connection to install a node on their device and contribute their unused bandwidth for web scraping purposes. It aims to empower open source AI by giving smaller players access to verifiable training data and, at the same time, compensate users for a practice that has been happening for 20-30 years without proper remuneration.
0 reply
0 recast
1 reaction