Are there any ongoing AI projects which use the Stack Exchange for machine learning?
2 Answers
There certainly appear to have been research projects involving some form of text mining / information retrieval /etc. and StackExchange sites.
Some examples I was able to find through google/google scholar (unlikely to be anywhere near an exhaustive list):
- TACIT: An open-source text analysis, crawling, and interpretation tool describes numerous text-crawlers for a variety of sites (including Stack Exchange sites, but also Twitter, Reddit, etc.). At first glance, this appears to be primarily about crawling, not about doing anything else with the data afterwards. Searching for other papers that cite this one on Google Scholar may yield interesting results though, it may lead to papers that used this for crawling and did more with the data afterwards.
- Chaff from the Wheat : Characterization and Modeling of Deleted Questions on Stack Overflow describes research into the quality of Stack Overflow questions in some sense (specifically, predicting whether questions will get deleted for example). I'm not 100% sure if this is also the kind of stuff you're interested in; it is Stack Exchange + Machine Learning as implied by the title of your question, but not necessarily about retaining information from answers as implied by the text in your question.
- Text mining stackoverflow: An insight into challenges and subject-related difficulties faced by computer science learners also describes text mining in StackOverflow questions and answers, though at a very quick glance it appears to be primarily about topic detection etc., not necessarily about automated question answering for example.
- Different Facets of Text Based Automated Question Answering System appears to be a relatively recent survey on the topic of Automated Question Answering research. Stack Exchange is mentioned a few times as an example of a source of data for such systems, but doesn't appear to be used otherwise.
- Extending PythonQA with Knowledge from StackOverflow is specifically about incorporating Questions and Answers from StackOverflow in an automated Question and Answering system for questions about the Python programming language. The paper provides a link to more details (http://pythonqas2.epl.di.uminho.pt), but that link appears to be down. I suppose you could always try contacting the authors directly if you're interested in more information on this.
More generally, Automated Question Answering systems appears to be a rather active area of research still, not a trivial / "solved" problem. StackExchange can be one source of data for such systems, but there are plenty of other sources of data too (Wikipedia, Quora, etc.).

- 9,894
- 2
- 25
- 66
DuckDuckGo learns answers to technical questions from StackExchange. Type a technical question like "ongoing projects use stackexchange" into DuckDuckGo and it will provide a highlighted summary of the answer on the right-hand side. And the duck has an open API for many (100s) more question answering data sources. Or you can go directly to the stackexchange api.
Projects can use the data from the SE open API as long as they comply with their TOU. Basically just make sure your users can tell that the data came from Stack Exchange. The copyright license may also limit your ability to alter the contents of the text, with say a learned abstractive summarizer. Perhaps that is why the Duck.com just highlights keywords.
Data rights law is in flux, especially when it comes to the data you submitted to a site and the machine learning models derived from that data. New European data and privacy rules empower you to download or delete all data you submit to a site like stack exchange.

- 151
- 4