还原历史——我的另类测试AI模型的方式。
deepseek v3.1发布后没过两天,谷歌也发布了一个新模型,据说编程能力也超过claude3.7了,好在这个模型现在可以在谷歌aistudio上免费用。网上也有各种各样的评测,有的用他们做网页前端的效果来比较,有的用它们做动画的效果来比较,还有用他们做小游戏的效果来比较。我也用自己的方法来比较一下。
总所周知,大语言模型在百科知识方面是全能的,这里让它们用自己的历史知识,结合空间能力和编程能力来完成一项任务,来比较最终效果。就是让AI用blender可以运行的python脚本,在blender中生成一个包含近代欧洲战争史上著名的棱堡的场景。这样就可以考察出各个AI模型的多种能力,以及它们综合应用的能力:
下面是我的提示词。
role
欧洲近代战争史的专家,同时是一位编程高手和3D艺术家。
profile
熟悉欧洲棱堡的建造知识,同时善于使用blender3D软件制作3D模型。
task
请你用blender的脚本功能,调用bpy库,编写可以再blender内运行的python脚本,创建一个欧洲棱堡的3D模型。
具体场景:
用一个绿色大平面代表平原,棱堡位于平原中央,周围散布着一些树木。
其他
完成脚本后,简要介绍一下欧洲棱堡的相关知识,和你的设计思路。
各个AI给出的结果:
1.claude 3.7
2.deepseek -R1
3.gemini 2.5 pro
感觉谷歌的AI模型完成任务有些敷衍。
4.deepseek V3.1
场景有些小,棱堡没完工就交付了。
最终结果一目了然。claude3.7的效果是最好的,细节最丰富。其次是R1,但和第一名的差距有些大。护城河的水都高出地面了。
deepseek v3.1 was released just two days after the new model from Google, which is said to have programming capabilities exceeding those of claude3.7. Fortunately, this model can now be used for free on Google AI Studio. There are various reviews online, comparing their web front-end performance, animation effects, and game effects. I will also compare them using my own methods.
It is well known that large language models are omniscient in encyclopedic knowledge. Here, they are asked to use their historical knowledge, combined with spatial and programming abilities, to complete a task and compare the final results. Specifically, I ask the AI to write a Python script that can run in Blender, using the bpy library to create a 3D model of a famous redoubt from the European war history.
Below are my prompt words.
role
An expert on European modern war history, a programming master, and a 3D artist.
profile
Familiar with the construction knowledge of European redoubts, skilled in using Blender 3D software to create 3D models.
task
Please use Blender's scripting functionality, call the bpy library, and write a Python script that can run in Blender to create a 3D model of a European redoubt.
Specific scene:
Use a large green plane to represent the plain, with the redoubt located in the center of the plain, surrounded by some trees.
Other
After completing the script, briefly introduce the relevant knowledge of European redoubts and your design思路.
The results from various AI models are as follows:
- claude 3.7
- deepseek -R1
- gemini 2.5 pro
I feel that Google's AI model was somewhat perfunctory in completing the task.
- deepseek V3.1
The scene is a bit small, with the redoubt unfinished.
The final results are clear. The effect of claude3.7 is the best with the most detailed features. Second is R1, but the gap with the first place is significant. The moat's water level is above the ground.
Upvoted! Thank you for supporting witness @jswit.