Go: Removing duplicate rows after SQL join result

huangapple go评论101阅读模式
英文:

Go: Removing duplicate rows after SQL join result

问题

我正在运行一个联合的SQL查询,查询地点和发生在这些地点的事件。在结果中,由于地点和事件之间是一对多的关系,地点数据会在每一行中重复出现。

清理重复的地点数据的最佳方法是什么?

在保持单个SQL操作的情况下,最合理的方法是在循环遍历查询结果(行)时进行检查。

然而,我似乎无法访问地点对象以检查是否存在先前的地点ID。

编辑:
这是SQL输出。如您所见,由于地点在事件之间共享,地点数据自然会多次出现。最终,这将作为JSON发送出去,其中包含嵌套的结构,一个用于地点,一个用于事件。

id 	title 		    latlng 				    id 	title 	        locationid 	
1 	Fox Thea... 	43.6640673,-79.4213863 	1 	Bob's Event		1
1 	Fox Thea... 	43.6640673,-79.4213863 	2 	Jill's Event 	1
2 	Wrigley ... 	43.6640673,-79.4213863 	3 	Mary's Event 	2
3 	Blues Bar 	    43.6640673,-79.4213863 	4 	John's Event 	3
1 	Fox Thea... 	43.6640673,-79.4213863 	5 	Monthly G... 	1
1 	Fox Thea... 	43.6640673,-79.4213863 	6 	A Special... 	1
1 	Fox Thea... 	43.6640673,-79.4213863 	7 	The Final... 	1

JSON输出。如您所见,地点数据被复制,导致JSON文件变大。

   {
        "Locations": [
            {
                "ID": 1,
                "Title": "Fox Theatre",
                "Latlng": "43.6640673,-79.4213863",
            },
            {
                "ID": 1,
                "Title": "Fox Theatre",
                "Latlng": "43.6640673,-79.4213863",
            },
            {
                "ID": 2,
                "Title": "Wrigley Field",
                "Latlng": "43.6640673,-79.4213863",
            },
            {
                "ID": 3,
                "Title": "Blues Bar",
                "Latlng": "43.6640673,-79.4213863",
            },
            {
                "ID": 1,
                "Title": "Fox Theatre",
                "Latlng": "43.6640673,-79.4213863",
            },
            {
                "ID": 1,
                "Title": "Fox Theatre",
                "Latlng": "43.6640673,-79.4213863",
            },
            {
                "ID": 1,
                "Title": "Fox Theatre",
                "Latlng": "43.6640673,-79.4213863",
            }
        ],
        "Events": [
            {
                "ID": 1,
                "Title": "Bob's Event",
                "Location": 1
            },
            {
                "ID": 2,
                "Title": "Jill's Event",
                "Location": 1
            },
            {
                "ID": 3,
                "Title": "Mary's Event",
                "Location": 2
            },
            {
                "ID": 4,
                "Title": "John's Event",
                "Location": 3
            },
            {
                "ID": 5,
                "Title": "Monthly Gathering",
                "Location": 1
            },
            {
                "ID": 6,
                "Title": "A Special Event",
                "Location": 1
            },
            {
                "ID": 7,
                "Title": "The Final Contest",
                "Location": 1
            }
        ]
    
    }

结构体:

// 事件类型
type Event struct {
	ID int `schema:"id"`
	Title string `schema:"title"`
	LocationID int `schema:"locationid"`
}

// 地点类型
type Location struct {
	ID int `schema:"id"`
	Title string `schema:"title"`
	Latlng string `schema:"latlng"`
}

// 地点和事件类型
type LocationsEvents struct {
	Locations []Location `schema:"locations"`
	Events []Event `schema:"events"`
}

运行查询并遍历行的函数:

func getLocationsEvents(db *sql.DB, start, count int) ([]Location, []Event, error) {

	var locations = []Location{}
	var events = []Event{}

	rows, err := db.Query("SELECT locations.id, locations.title, locations.latlng, events.id, events.title, events.locationid FROM locations LEFT JOIN events ON locations.id = events.locationid LIMIT ? OFFSET ?", count, start)
	if err != nil {
		return locations, events, err
	}
	defer rows.Close()

	for rows.Next() {
		var location Location
		var event Event

		err := rows.Scan(&location.ID, &location.Title, &location.Latlng, &event.ID, &event.Title, &event.LocationID)
		if err != nil {
				return locations, events, err
		}
		
    // 在这里我可以打印locations并看到它在每次循环迭代时变长
	fmt.Println(locations)

	// 我如何检查locations中是否存在一个ID?
	// 理想情况下,如果location.ID已经存在于locations中,那么只追加event,否则,同时追加location和event

		locations = append(locations, location)
		events = append(events, event)
	}

	return locations, events, nil
}

路由器调用的函数:

func (a *App) getLocationsEventsJSON(w http.ResponseWriter, r *http.Request) {

count := 99
start := 0

    if count > 10 || count < 1 {
        count = 10
    }
    if start < 0 {
        start = 0
    }

    locations, events, err := getLocationsEvents(a.DB, start, count)
    if err != nil {
        respondWithError(w, http.StatusInternalServerError, err.Error())
        return
    }

	var locationsEvents LocationsEvents

	locationsEvents.Locations = locations
	locationsEvents.Events = events

    respondWithJSON(w, http.StatusOK, locationsEvents)
}

将数据以JSON格式发送出去的函数(REST API的一部分):

func respondWithJSON(w http.ResponseWriter, code int, payload interface{}) {
    response, _ := json.Marshal(payload)

    w.Header().Set("Content-Type", "application/json")
    w.WriteHeader(code)
    w.Write(response)
}

更新:

如果回到使用SQL查询的方式,有哪些可能性?使用GROUP BY吗?以下是一个示例SQL查询:

SELECT locations.id, locations.title, locations.latlng, events.id, events.title, events.locationid
FROM locations
LEFT JOIN events ON locations.id = events.locationid
GROUP BY locations.id, events.id

结果集仍然包含重复的地点数据,但已经按组进行了排序。

然后还有子查询的可能性:http://www.w3resource.com/sql/subqueries/understanding-sql-subqueries.php,但现在我正在运行多个SQL查询,这是我想要避免的。

实际上,我认为在使用像我这样的单个连接查询时,无法避免地点数据的重复。否则,如何接收连接数据的结果集,而不会复制地点数据?让SQL服务器在需要时将预先制作的JSON数据发送给我(分开地点和事件)?根据我的理解,在接收结果后进行这项工作更好。

英文:

I’m running a joined SQL query for locations and events (occuring at the locations). In the results, naturally the location data is replicated per row, as there’s a one-to-many relationship: one location holds multiple events.

What’s an optimal approach to clean up the multiplied location data?

Staying with a single SQL operation, what makes the most sense is performing a check while looping through the query results (rows).

However I cannot seem to access the locations object to check for a pre-existing location ID.

Edit:
This is the SQL output. As you see, location data naturally occurs multiple times, because it's shared across events. Ultimately this will be sent out as JSON eventually, with nested structs, one for locations, one for events.

id 	title 		    latlng 				    id 	title 	        locationid 	
1 	Fox Thea... 	43.6640673,-79.4213863 	1 	Bob&#39;s Event		1
1 	Fox Thea... 	43.6640673,-79.4213863 	2 	Jill&#39;s Event 	1
2 	Wrigley ... 	43.6640673,-79.4213863 	3 	Mary&#39;s Event 	2
3 	Blues Bar 	    43.6640673,-79.4213863 	4 	John&#39;s Event 	3
1 	Fox Thea... 	43.6640673,-79.4213863 	5 	Monthly G... 	1
1 	Fox Thea... 	43.6640673,-79.4213863 	6 	A Special... 	1
1 	Fox Thea... 	43.6640673,-79.4213863 	7 	The Final... 	1

The JSON output. As you see location data is multiplied making for a larger JSON file.

   {
        &quot;Locations&quot;: [
            {
                &quot;ID&quot;: 1,
                &quot;Title&quot;: &quot;Fox Theatre&quot;,
                &quot;Latlng&quot;: &quot;43.6640673,-79.4213863&quot;,
            },
            {
                &quot;ID&quot;: 1,
                &quot;Title&quot;: &quot;Fox Theatre&quot;,
                &quot;Latlng&quot;: &quot;43.6640673,-79.4213863&quot;,
            },
            {
                &quot;ID&quot;: 2,
                &quot;Title&quot;: &quot;Wrigley Field&quot;,
                &quot;Latlng&quot;: &quot;43.6640673,-79.4213863&quot;,
            },
            {
                &quot;ID&quot;: 3,
                &quot;Title&quot;: &quot;Blues Bar&quot;,
                &quot;Latlng&quot;: &quot;43.6640673,-79.4213863&quot;,
            },
            {
                &quot;ID&quot;: 1,
                &quot;Title&quot;: &quot;Fox Theatre&quot;,
                &quot;Latlng&quot;: &quot;43.6640673,-79.4213863&quot;,
            },
            {
                &quot;ID&quot;: 1,
                &quot;Title&quot;: &quot;Fox Theatre&quot;,
                &quot;Latlng&quot;: &quot;43.6640673,-79.4213863&quot;,
            },
            {
                &quot;ID&quot;: 1,
                &quot;Title&quot;: &quot;Fox Theatre&quot;,
                &quot;Latlng&quot;: &quot;43.6640673,-79.4213863&quot;,
            }
        ],
        &quot;Events&quot;: [
            {
                &quot;ID&quot;: 1,
                &quot;Title&quot;: &quot;Bob&#39;s Event&quot;,
                &quot;Location&quot;: 1
            },
            {
                &quot;ID&quot;: 2,
                &quot;Title&quot;: &quot;Jill&#39;s Event&quot;,
                &quot;Location&quot;: 1
            },
            {
                &quot;ID&quot;: 3,
                &quot;Title&quot;: &quot;Mary&#39;s Event&quot;,
                &quot;Location&quot;: 2
            },
            {
                &quot;ID&quot;: 4,
                &quot;Title&quot;: &quot;John&#39;s Event&quot;,
                &quot;Location&quot;: 3
            },
            {
                &quot;ID&quot;: 5,
                &quot;Title&quot;: &quot;Monthly Gathering&quot;,
                &quot;Location&quot;: 1
            },
            {
                &quot;ID&quot;: 6,
                &quot;Title&quot;: &quot;A Special Event&quot;,
                &quot;Location&quot;: 1
            },
            {
                &quot;ID&quot;: 7,
                &quot;Title&quot;: &quot;The Final Contest&quot;,
                &quot;Location&quot;: 1
            }
        ]
    
    }

Structs:

// Event type
type Event struct {
	ID int `schema:&quot;id&quot;`
	Title string `schema:&quot;title&quot;`
	LocationID int `schema:&quot;locationid&quot;`
}

// Location type
type Location struct {
	ID int `schema:&quot;id&quot;`
	Title string `schema:&quot;title&quot;`
	Latlng string `schema:&quot;latlng&quot;`
}

// LocationsEvents type
type LocationsEvents struct {
	Locations []Location `schema:&quot;locations&quot;`
	Events []Event `schema:&quot;events&quot;`
}

Function running the query and looping through rows:

func getLocationsEvents(db *sql.DB, start, count int) ([]Location, []Event, error) {

	var locations = []Location{}
	var events = []Event{}

	rows, err := db.Query(&quot;SELECT locations.id, locations.title, locations.latlng, events.id, events.title, events.locationid FROM locations LEFT JOIN events ON locations.id = events.locationid LIMIT ? OFFSET ?&quot;, count, start)
	if err != nil {
		return locations, events, err
	}
	defer rows.Close()

	for rows.Next() {
		var location Location
		var event Event

		err := rows.Scan(&amp;location.ID, &amp;location.Title, &amp;location.Latlng, &amp;event.ID, &amp;event.Title, &amp;event.LocationID);
		if err != nil {
				return locations, events, err
		}
		
    // Here I can print locations and see it getting longer with each loop iteration
	fmt.Println(locations)

	// How can I check if an ID exists in locations?
	// Ideally, if location.ID already exists in locations, then only append event, otherwise, append both the location and event

		locations = append(locations, location)
		events = append(events, event)
	}

	return locations, events, nil
}

Function called on by router:

func (a *App) getLocationsEventsJSON(w http.ResponseWriter, r *http.Request) {

count := 99
start := 0

    if count &gt; 10 || count &lt; 1 {
        count = 10
    }
    if start &lt; 0 {
        start = 0
    }

    locations, events, err := getLocationsEvents(a.DB, start, count)
    if err != nil {
        respondWithError(w, http.StatusInternalServerError, err.Error())
        return
    }

	var locationsEvents LocationsEvents

	locationsEvents.Locations = locations
	locationsEvents.Events = events

    respondWithJSON(w, http.StatusOK, locationsEvents)
}

Function sending data out as JSON (part of REST API):

func respondWithJSON(w http.ResponseWriter, code int, payload interface{}) {
    response, _ := json.Marshal(payload)

    w.Header().Set(&quot;Content-Type&quot;, &quot;application/json&quot;)
    w.WriteHeader(code)
    w.Write(response)
}

UPDATE:

Reverting to doing this with the SQL query, what are the possibilities? Using GROUP BY? Here is an example SQL:

SELECT locations.id, locations.title, locations.latlng, events.id, events.title, events.locationid
FROM locations
LEFT JOIN events ON locations.id = events.locationid
GROUP BY locations.id, events.id

The result set still contains duplicated location data, however it's nicely grouped and sorted.

Then there's the possibility of sub-queries:
http://www.w3resource.com/sql/subqueries/understanding-sql-subqueries.php but now I'm running multiple SQL queries, something I wanted to avoid.

In reality I don't think I can avoid the duplicated location data when using a single join query like I am. How else would I receive a resultset of joined data, without having location data replicated? Having the SQL server send me pre-made JSON data as I need it (locations and events seperated)? From my understanding it's better doing that work after receiving results.

答案1

得分: 2

我认为你可以将你的请求分为两部分:位置(SELECT * FROM locations)和事件(SELECT * FROM events),然后将它们传递给JSON编组器。这两个请求对于数据库来说非常简单和快速。接下来,它们将更容易缓存中间结果。

但是现在我正在运行多个SQL查询,这是我想要避免的。

请你澄清一下这一点——为什么你想要避免多个查询?你想要解决什么任务,有什么限制?有时候一系列简单的查询比一个过于复杂的查询更好。

英文:

I think you can split your request in two: locations (SELECT * FROM locations) and events (SELECT * FROM events) and then pass them to JSON marshaller.
These 2 requests will be very easy and fast for database to perform. Next they will be easier to cache intermediate results.

> but now I'm running multiple SQL queries, something I wanted to avoid.

Could you pls clarify this moment - why do you want to avoid multiple queries? What task do you want to solve and what limitations have? Sometimes set of small easy queries are better than one overcomplicated.

答案2

得分: 0

如果您自己查询数据库,应该能够避免重复项。在查询的末尾添加"GROUP BY {唯一字段}"。

以下是一个示例,可以给出您活动列表上的唯一位置列表:

SELECT location.*
FROM location.ID, location.Title, location.Latlng
INNER JOIN event ON event.ID=location.ID
GROUP BY location.ID

英文:

If you are querying the database yourself, you should be able to avoid any duplicates in the first place.
In the end of your query add "GROUP BY {unique field}".

Example that should give a unique list of locations that are on you event list

SELECT location.* 
FROM location.ID, location.Title, location.Latlng
  INNER JOIN event ON event.ID=location.ID
GROUP BY location.ID

huangapple
  • 本文由 发表于 2017年7月3日 03:41:31
  • 转载请务必保留本文链接:https://go.coder-hub.com/44874566.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定