I have talked about the taxonomy systems before. Now I finally believe I envisioned the best taxonomy system and how to implement it in databases, even create specialized database for it.
This is basically build up on the OOP design. Tags are basically like classes.
All the classes considered are static classes, there is no operations, there is only descriptions. It's all data, we want to keep classes in the database.
Now, to make it look more like the new web classification, I will use tag instead of class.(irony)
Each tag can have 3 different types of properties--sub-tag, super-tag and quantity.
Sub-tag: "Polygon" can be a sub-tag of "Polytope", "Polytope" is the super-tag. A sub-tag will automatically inherit all the quality(super-tag) and quantity of its super-tag.
Super-tag: Super-tags describe it's sub-tag. All qualities are super-tags. Say, a tag "Polygon" that have super-tag(quality) "Polytope", all qualities "Polytope" have, "Polygon" will have.
Quantity: numbers. Quantity is a form of super-tag, but since they are so special, because there are infinite amount of them, it's better to make it into a special group to think about. Say "Triangle" are "Polygon" with "Vertex" equal to "3".("Vertex" is a super-tag of "Polygon", all the super-tag of "Vertex" still works, just "Vertex" have an number "3" associated with it)
Quantity just describe how many of "tag" exists, the tag can be a unit, like "g". "kg" is a sub-tag of "g", and "g" is assigned with 1000 for the tag "kg".
I would not like to make this system too complicated, but just for some extension...
A quantity can be set as the domain of all possible quantity values. Like "Positive Integers", "{x| 1
Super-tag, Sub-tag can be classified as relation. Only one is required to figure find the other. I personally believe it's good to have both tags. Because quantity uses super-tag and doesn't use sub-tags.
The tags can be used to tag an node. Node can't be used to tag other nodes. That's the only difference between tags and nodes.
Implementation:
The theory.. I say it's more advanced than Drupal's taxonomy! It introduced quantity and remove "related tags"(which doesn't show anything about the relation... just saying there *is* a relation...well everything have a relation with each other...I still don't get why Drupal would not remove it from their source.)
The database design
There will be only item ids, because this is only a classification system. The data associated with the ids, like name and description are stored elsewhere.
Table: tag_sup
Field: t_id, s_id
Provide a link from a tag to it's super-tag
Table: tag_quant
Field: t_id, s_id, q
Provide associate a quantity from a tag with one of it's super-tag
Table: node_tag
Field: n_id, t_id
associate nodes with tags
Table: node_quant
Field: n_id, t_id, q
associate nodes with tags and a quantity.
Great, entire system in 4 tables. One might ask, if we pretend nodes are tags, then there will be only 2 tables. but it will be easier to separate them into 2 tables for future searches.
That's all for databases.
A search is following the path.
Say, someone want all nodes tagged with "Polygon" or it's sub-tags. The program will go though the tag_sub list. find all the sub-tags of "Polygon". Now go to node_tag, find all the n_id got tagged by the sub-tags. Done
To search by quantity, we can even use numerical related operators in database, like return result with quantity mod 2 = 1
Actually... the core is simple, the difficult part is to make it user friendly and can handle a few problems. But there are ways to solve them.
1. Users don't know the tag's id, they only know the tag's common names. Like "Polygon" instead of "3442"
2. When user use common names, 2 word can mean the same thing, like "Film" and "Movie".
3. A word might mean different things under different topics, like "base"
4. A user might tag an new born cat "kitten", the other might tag it "cat". "kitten" are sub-tag of "cat", only the lowest tag suppose to show up.
Ways to solve the problem:
1. Associate ID with common names
table: tag_name
fields: t_id, n_id
2. use the table created above, add more n_id, which associated with names
3. when 1 word have more than one meaning, it associated with different n_id. The system should detect which n_id with different meaning, according to setting, either feed back to the user so the user can chose which meaning(super-tag) he wants, or select the one most likely meant by the user.
4. Find an algorithm remove super-tags so all the tag remaining associate with node are not super/sub tags of each other.
As you can see, I still haven't organized everything in a very proper manner yet.
These are just some thoughts... I wish I can think of something better than the very messy quantity property... but it's not likely, there is no way to make all numbers into tags. It's more likely the quantity property will evolve into logic property, like under w/e condition w/e will be classified as w/e. Please make suggestions if you can xD. The smartest way for most people is just remove quantity and never address quantity in classification.
Recent comments
1 day 13 hours ago
1 day 13 hours ago
1 day 14 hours ago
1 day 18 hours ago
2 days 3 hours ago
2 days 5 hours ago
3 days 14 hours ago
4 days 51 min ago
5 days 11 hours ago
1 week 21 hours ago