Neighbours, the most similar users
Component ID
Component name
Component type
Component security advisory coverage
Component created
Component changed
Component body
Description
When users create content, they form meta-information automatically. Each user's content can be compared to the content of others so that similarities can be counted. Those users having the highest similarity values will be called "neighbours" (neighbors).
This upcoming module will do these comparisons and will provide a user tab and a block that shows the neighbours of this user. The aim is to develop a clone of the neighbour functions that can be seen at www.last.fm
This module will be designed to fullfil the needs of very large communities. So the calculation will be made with a cron hook. The calculated value should be interpretable, the current model gives a value in percent. Calculation will be made on the tags users give to their own content. Users who share the same tags in the most similar (relative) frequency become nearest neighbour. To only depend on tags will make development easy, calculation fast and will encourage users to tag, tag, tag!
Requirements
This module requires Drupal 5.x with modules taxonomy and tagadelic installed.
This module does not yet offer PostgreSQL support. If you would like to contribute to this module by creating the appropriate PostgreSQL schema, please submit your code at http://drupal.org/project/issues/neighbours
Installation
- Copy the neighbours directory containing the neighbours.module to the Drupal sites/all/modules/ directory (or another modules directory).
- Enable Neighbours in the "Administer > site settings > modules" administration screen.
Enabling the Neighbours module will trigger the creation of the database schema. If you are shown error messages you may have to create the schema by hand. Please see the database definition in neighbours.install.
- Enable neighbours blocks you want in the "Administer > site settings > blocks" administration screen .
- Apply neighbours settings on the "Administer > site configuration > neighbours" page.
Using neighbours
This module offers a block "similarity" that calculates on the fly the tag similarity between the current user and the user whose profile is viewed. Therefore it makes sense to limit the blocks display to directory user/*
.
A second block "neighbours" shows the core-feature. Put aside a profile page (user/*
again) it displays users, who are most similar in their relative quantity of giving tags compared to that viewed user.
An additional user tab is offered that shows all calculated neighbours.
On a settings page the admin can configure calculation and appearance. The calculation process can be started by cron or by hand on an additional settings tab.
Calculating similarity
At this stage, similarities are derived from one vocabulary only.
For each tag the relative quantity for each user is counted
f(tag|user) = sum_nodes(tag,user) / sum_nodes_tags(user)
Read this as: relative frequency f of a certain tag given a certain user is countable from the sum over all nodes having a certain tag given by a certain user divided by the total count of a user's tags.
We store all values > 0 for further calculations. Zero-values can be dropped.
This is the percentage a user has chosen to describe a piece of his own content with a certain tag. The percentage can be compared to the percentage an other user has taken the same tag to describe her content.
In the next step we count how much of these percentages two users have in common. For each tag two users have in common, we count the tag depended similarity:
s(user1,user2|tag) = min(f(tag|user1), f(tag|user2))
If we sum all tags, we get the over-all percentage of commonly used tags:
s(user1,user2) = sum_tags(s(user1,user2|tag))
Now we order these values for a given user1 highest first and the correspondent user2s become the neighbours with a relative similarity s(user1,user2)
Notes:
- s(user1,user2)=s(user2,user1), so maybe we can safe some calculations.
- If a user is my nearest neighbour, it is possible that I'm not her's!
Issues
The following issues must be thought over. If you are a mathematican, a developer or a person who could share my love to neighbours, your help is strongly appreciated! :)
- For communities that are open to everybody adults and kids can register. Different ages have different needs. To protect minors an "age gap" should be implemented that makes it lesser probable that adults become neighbours of minors. To keep the similarity values interpretable, additional pseudo-terms should be generated that represent the nearness of the ages, like "same generation", "same year of birth", "similar birthday" etc. The difficulty is to merge these pseudo-terms into the real ones.
- With each calculation, all calculations that have been done before must be repeated. Another algorithm or another measurement would be helpful to just calculate the neighbours from the changes each user made on her content.
- Use different vocabularies
- Create efficient db-tables: Discuss dependency on taxonomy_user