1,868 questions
2
votes
0
answers
89
views
How to correctly calculate distance and similarity for each step in hierarchical clustering (Ward.D2)?
I am grouping my data using the ward.D2 hierarchical clustering method in R.
I need to calculate the distance and similarity for each step, from 2 to 20 clusters.
Similarity is calculated using the ...
1
vote
1
answer
146
views
Neo4j vector similarity function
I'm trying to understand the difference between the vector.similarity.cosine Cypher function and the gds.similarity.cosine function in Neo4j. According to the Neo4j documentation, both are used to ...
1
vote
1
answer
178
views
Rapidfuzz giving no matches but Fuzzywuzzy does
I have been developing a matching system which matches the rows of the client and our central database depending on similarity. I have used a hybrid approach where I needed to somehow map the Company, ...
1
vote
0
answers
148
views
How to compute text–image similarity under local inference with generative vision-language models (e.g. Qwen2.5-VL, Gemma 3)?
I’ve been working with Qwen2.5-VL and Gemma3 locally, and I need to measure the similarity between text and image embeddings—similar to CLIP/SigLIP—but I’m resource-limited and can’t spin up ...
1
vote
3
answers
91
views
In sql, group by using similar group_name
How can I perform a
GROUP BY
in SQL when the group_name values are similar but not exactly the same?
In my dataset, the group_name values may differ slightly (e.g., "Apple Inc.", "...
0
votes
0
answers
86
views
Use terra to calculate relative similarity of raster values between areas inside and outside of a group of polygons in R
This question builds on a helpful solution provided for calculating uniqueness across a SpatRaster -- Using terra in R to calculate & map similarity and uniqueness across cells of a large GDM-...
0
votes
1
answer
184
views
Using terra in R to calculate & map (dis)similarity and uniqueness across cells of a large GDM-based raster
I am looking to employ terra to update an analytical workflow (from Mokany et al 2022; Glob Ecol and Biogeo) originally written in R with the raster package. The workflow involves spatial analyses of ...
0
votes
0
answers
36
views
Near Similarity and duplication detection
I have a ticketing system where people create ticket for their issue. When someone is trying to create a new ticket I have to search my elastic search to identify whether a similar ticket is already ...
0
votes
1
answer
67
views
how to compare lists with their frequency
I'm interested in finding similarities between two lists. I have the count of duplicates in the first column, and the pattern is in the second column. What would be the most logical way to compare ...
0
votes
0
answers
97
views
wrong similarity search result on chroma db
I am appending each row in csv file into chromadb with format, such as below;
#Acme1 prod #Acme1 line, AC1
#Acme2 prod #Acme2 line, AC2
embedding = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-...
0
votes
1
answer
69
views
catelog sentences into 5 words that represent them
I have dataframe with 1000 text rows. df['text']
I also have 5 words that I want to know for each one of them how much they represnt the text (between 0 to 1)
every score will be in df["word1&...
1
vote
1
answer
56
views
Visualize species occurrence similarities of 5x5 grid-cells and soil samples in R
I have two data frames:
ref_df containing for each row information about species and the latitude and longitude they was recorded and
sample_df with sample names as rows and species names as columns,...
0
votes
1
answer
72
views
similarity from word to sentence after doing words Embedding
I have dataframe with 1000 text rows.
I did word2vec .
Now I want to create a new field which give me the distance from each sentence to the word that i want, lets say the word "king".
I ...
3
votes
1
answer
61
views
How can one obtain the "correct" embedding layer in BERT?
I want to utilize BERT to assess the similarity between two pieces of text:
from transformers import AutoTokenizer, AutoModel
import torch
import torch.nn.functional as F
import numpy as np
tokenizer ...
1
vote
0
answers
31
views
Ideal method to do image similarity comparison between two binary images - Tactile Maps
I have a task for image similarity comparisons between two binary images of tactile maps (maps for the visually impaired to map out an area). The goal is to have an output score of how similar the two ...
0
votes
0
answers
27
views
compare two blob colomn values to find percentage of similarity
how to compare two blob colomns on the same table ?
For example: PIC1 contains a picture of a bridge taken from 5 meter distance, whereas PIC2 contains a picture of the same bridge taken from 10 meter ...
1
vote
1
answer
49
views
Fetch rows from PostgreSQL with rearranged words similar to a given string
I want to retrieve all rows from a PostgreSQL database that contain sentences similar to a provided string. The sentences in the database can have their words in any order (rearranged). How can I ...
1
vote
0
answers
36
views
Directed Graph Edit Distance Computation Issue Using AStar Algorithm in Graph-Matching-Toolkit
I am comparing pairs of identical directed graphs represented in GXL format using the Graph-Matching-Toolkit(https://github.com/dzambon/graph-matching-toolkit). Since the graphs are identical, I ...
0
votes
1
answer
35
views
error in recontructuion function of SA attack algorithm for CB systems
I try to execute the attached code in the following link:
https://github.com/biometricsecurity/Preimage-attack-on-BTP-template
The code aims to measure the similarity attacks at CB systems. I have ...
1
vote
1
answer
41
views
How to create a list of comparisons with an agentset?
The agents (called farms) in my model have peers (an agentset of farms they have links to). Each farm has an attribute called rotation, which is a list of values between 0 and 1 (same length for all ...
0
votes
1
answer
121
views
Implementation of Angular Metric for Shape Similarity (AMSS) with Python
Is there any existing optimized Python implementation of Angular Metric for Shape Similarity (AMSS)?
Otherwise, could I approximate it by considering the derivative DTW and using cosine similarity ...
1
vote
0
answers
47
views
Mismatch between Milvus Id and Filename
I am trying to build a small image search program and using Milvus as a database to store my embeddings, on trying to retrieve the result by matching the vector embeddings with the embeddings obtained ...
0
votes
1
answer
480
views
Why are my dimensions different when using OpenAi embeddings in Python?
I have a single Python function that I am using the embed JSON objects are different lengths. The issue I am having is that, somehow, the dimensions are different when comparing the vectors and I ...
1
vote
0
answers
56
views
How can I store boolean array of size 3000 efficiently in milvus?
I have a boolean array which represents store-availability of retail products across 3000 different stores.
so, my schema looks like below:
product_id = FieldSchema(
name="product_id",
...
0
votes
0
answers
232
views
How can I perform accurate vector search for complex objects?
I have objects that have many attributes, example:
item = {
'id': 123,
'name': 'Keyboard',
'price': 12,
'url': 'example.com',
'description': 'a keyboard that etc...',
'details': {
'color': '...
1
vote
0
answers
85
views
How can one output n vectors with unique metadata in a query with ChromaDB?
Following on the example here, one way to create a query of the collection from ChromaDB with filtering by a given type of metadata (i.e. "source_type") is
results = collection.query(
...
0
votes
1
answer
191
views
Plot upper triangle correlation matrix with similarity scores using ggplot
I have a dataframe as given below:
The table only has values from the upper triangle of a matrix.
I want to plot a correlation plot (correlogram) where the colours show the correlation and size ...
0
votes
0
answers
118
views
Multi-attribute similarity search across millions or records based on criteria
Problem description: I am trying to perform an efficient multi-attribute similarity search across millions of records in a database. However, my process requires an hierarchical order of criteria for ...
1
vote
0
answers
79
views
Is it possible to compare multiple line graphs to give a sort of ' similarity rating'
So I am trying to measure data from a smartphone ambient light sensor (ALS). My goal is to be able to be able to look at the data and be able to infer the location of the device.
To do this my plan is ...
1
vote
2
answers
145
views
similarity between two numpy arrays based on shape but not distance
import matplotlib.pyplot as plt
import numpy as np
from numpy.linalg import norm
def cosine_similarity(arr1:np.ndarray, arr2:np.ndarray)->float:
dot_product = np.dot(arr1, arr2)
magnitude =...
1
vote
2
answers
136
views
How to detect if two sentences are simmilar, not in meaning, but in syllables/words?
Here are some examples of the types of sentences that need to be considered "similar"
there was a most extraordinary noise going on shrinking rapidly she soon made out
there was a most ...
1
vote
0
answers
644
views
Langchain FAISS | Any solutions or alternatives for similarity search on vector DBs for slightly repetitive short words with numerics?
So basically I am trying to search a cell line vector data base that has entries that look like this using langchain:
ID: 253F1
AC: CVCL_B513
SY: NA
OX: NCBI_TaxID=9606; ! Homo sapiens (Human)
CA: ...
-2
votes
1
answer
46
views
I have plots of points that I extract from an image. How can I determine a similarity measure between two different plots? [closed]
Each point has an x, y, and size.
For example these should result in similar:
Plot 1-A:
Plot 1-B:
And these should not result in similar:
Plot 2-A:
Plot 3-A:
Are there any algorithms or ways to ...
0
votes
2
answers
186
views
Shared triples between two knowledge graphs
I want to compare two semantic Knowledge Graphs, to see if they have any triple in common, using cypher.
MATCH (n1)-[r1]-(c1)
MATCH (n2)-[r2]-(c2)
WHERE r1.filePath = "../data/graph1.json" ...
0
votes
1
answer
75
views
record matching/similarity calculation for numbers and characters
I have a dataframe structured the following way with much more rows and columns:
Report_ID
Block_ID
Number
Character
1
1
5
A
2
1
3
A
3
1
2
B
4
2
10
A
5
2
11
B
6
2
100
C
7
3
2
D
8
3
#NA
A
9
3
8
D
10
3
...
1
vote
2
answers
97
views
VBA collect consecutive similar cells in the row
I have a list of non conformities appeared in different time with different products. I need to find out similar problems.
I already made sorting
Now I need to get new sheet with similar rows with ...
2
votes
1
answer
355
views
Textual similarity between two tags in Nodejs
I want to rate the similarity between two tags. For example the words technology, computer and chip should have high similarity, a word like food should be low similarity.
Given the recent ...
2
votes
0
answers
38
views
Get similarity within a column based on another column
I have a table with three columns: Source, Target, Similarity. The first two are strings, the last one is a float. This table came about by comparing source elements and target elements and finding ...
0
votes
1
answer
2k
views
SSIM (Structural similarity index measure) performance
I have a reference image A and 2 target images B and C , I tried to measure the SSIM as follows :
(from a human vision perception A & B are from the same class) and A & C from different ...
1
vote
0
answers
128
views
Similarity search in a python database using rdkit
How to run a similarity search in a database and the output should be a table with molecules which passed a specific treshold?
I tried this
query = sql.SQL("""
SELECT *, ...
0
votes
0
answers
72
views
How to quickly query similar text through postgresql?
I need to query the top few texts that are most similar based on the input content.
The table structure is as follows:
create table documents (
id bigserial primary ...
0
votes
1
answer
79
views
Trying to skip columns in loop if requirement isn't satisfied
I have this python code trying to find similarities between brands based on the types of products they sell and at what price point. One issue I'm running into is price, sales, and number of products ...
0
votes
2
answers
107
views
Creating a similarity matrix with jagged arrays
i have a dataframe as such.
id
action
enc
Cell 1
run,swim,walk
1,2,3
Cell 2
swim,climb,surf,gym
2,4,5,6
Cell 3
jog,run]
7,1
This table goes on for roughly 30k rows. After gathering all these actions, ...
1
vote
0
answers
46
views
Find proportion of two vectors/arrays which overlap
I have two vectors, for example:
a = 25,26,37,36,27,33,104,44,40,49,45,48,50,55,56,59,54,57,105,64,73,76,72,67,68,71,78,82,77,79,86,84,83,85,91,92,96,97,102,101,93,98,99,100,94,95,88,87,65,66,90,89,80,...
-3
votes
2
answers
3k
views
Meaning behind 'thefuzz' / 'rapidfuzz' similarity metric when comparing strings
When using thefuzz in Python to calculate a simple ratio between two strings, a result of 0 means they are totally different while a result of 100 represents a 100% match. What do intermediate results ...
1
vote
0
answers
44
views
Algorithm to generate top N% of permutations with most dissimilarities
Let us consider a set of data with, say, 10 values. We need to estimate its characterstics by Monte Carlo method, that is with a large number of randomly generated permutated sets.
If we'd consider ...
1
vote
1
answer
96
views
Maximum jaccard similarity in igraph
From the igraph documentation: "The Jaccard similarity coefficient of two vertices is the number of common neighbors divided by the number of vertices that are neighbors of at least one of the ...
0
votes
2
answers
1k
views
Langchain: ChromaDB: Not able to retrive large numbers of PDF files vector database from Chroma persistence directory
My programme is chatting with PDF files in a directory. Surprisingly the code works if there 5 PDF files in directory of 1 page each. But it doesn't work when there are 1000 files of 1 page each. It ...
2
votes
2
answers
104
views
Cosine similarity between each two rows in a dataframe
I have a data frame called text with two columns, year and text. Find the dput output below for an example:
text <- structure(list(year = 2000:2007, text = c("I went to McDonald's and they ...
1
vote
1
answer
99
views
How to group redundant entries of a dictionary
I have a dictionary of entries like this:
{
'A': {
'HUE_SAT': 1,
'GROUP_INPUT': 1,
'GROUP_OUTPUT': 1
},
'D': {
'HUE_SAT': 1,
'GROUP_INPUT': 1,
...